Machine Learning 2.0 with Hugging Face Transformers Julien Simon

October 03, 2022
According to the latest State of AI report, "transformers have emerged as a general-purpose architecture for ML. Not just for Natural Language Processing, but also Speech, Computer Vision or even protein structure prediction." Indeed, the Transformer architecture has proven very efficient on a wide variety of Machine Learning tasks. But how can we keep up with the frantic pace of innovation? Do we really need expert skills to leverage these state-of-the-art models? Or is there a shorter path to creating business value in less time? In this code-level talk, we'll gradually build and deploy a Machine Learning application based on Transformers models. Along the way, you'll learn about the portfolio of open source and commercial Hugging Face solutions, how they can help you become hyper-productive in order to deliver high-quality Machine Learning solutions faster than ever before. Julien is currently Chief Evangelist at Hugging Face. He's recently spent 6 years at Amazon Web Services where he was the Global Technical Evangelist for AI & Machine Learning. Prior to joining AWS, Julien served for 10 years as CTO/VP Engineering in large-scale startups. ✅ Connect with Julien: https://www.linkedin.com/in/juliensimon/ ✅ Connect with Optimized AI Conference on LinkedIn - https://www.linkedin.com/company/oaiconference/ ✅ Connect with Optimized AI Conference on Twitter: https://x.com/southerndsc ✅ Visit Optimized AI Conference Website - https://www.oaiconference.com/

Transcript

Our next speaker is Julien from Arcee. Julien is a chief evangelist. I love that title, Julien. And I love your shirt. I already told you that. It's really cool. He's the chief evangelist of machine learning, data science, and AI at a privately owned company called Arcee. Thank you. Good morning, everybody. I flew last night. I'm going to give you a quick intro today. The world is becoming an API, SaaS platform, or cloud. A few years later, we realized how true this prediction was. For a lot of very hard problems that they could never solve with traditional statistical machine learning algorithms, deep learning was just the solution. Language processing, audio, reinforcement learning, the list goes on. We were all swimming in new architectures like CNNs, not the one from Atlanta, RNNs, and LSTMs. So many variants. The story, I read the books. Back from the dead once again, very old technology made by the availability of chip compute. Very popular. Of course, to feed those models, we needed tons of data. And that wasn't so sexy and fun. Labeling, cleaning, preparing very large data sets. That was, and that still is, a massive effort. GPUs made it possible to apply those neural network cells to those used data sets in a reasonably efficient manner, and they did that using what I would call expert tools. TensorFlow, PyTorch, as amazing as they were, no one would really call them user-friendly, and it was hard to get to a successful project. We want everybody to join the party, not just the top 10 machine learning companies in the world. A lot of projects still fail. You get to maybe POC stage, and that's fine, but very few projects actually end up at scale in production. A lot of people are trying to do it, but how many companies are very successful and see very strong impact with deep learning? The shark is just being here. And there are tons of reasons for that. My favorite is unclear goals. Technology can do. Everybody else is concerned, maybe their boss is in the room. So I get it. Job security matters. But seriously, so many projects just go on without a clear business objective. At the end of the day, that's what you want: business impact. It either falls into the we want to make money category or the we want to save money category. If you don't have that, I don't think you have a project. Tools, over time, have become more and more complex, and you do need quite a collection of skills and tools to get the job done. Things could be a little bit different, but question that. Very few companies have mastered the machine learning workflow. Six months labeling data and then six months trying to deploy. By the time you're done, you've forgotten what problem you were originally trying to solve. You will not know anything about your model until you've thrown real, messy, buggy data at it. We're in deep learning 2.0, and we're trying to build and push and grow with the community. Obviously, transformers are becoming the new de facto solution for deep learning. Transfer learning is a key advance in the sense that we don't really need to build and clean and manage those huge data sets. We can use off-the-shelf models that have been pre-trained by their authors on huge quantities of data. Models can be tested out of the box, and they're good enough that we can use them as is. Save time and money. For business-specific problems, like chemical engineering or life sciences or genomics, your documents will have very complex vocabulary that is marginally present in open data sets like Wikipedia. To extract every bit of accuracy, you want to train a little longer. The more you can use new chips to do this, the better. GPUs are still very interesting, but a number of companies are building new chips that tend to deliver a better cost-performance ratio than GPUs. We're desperately trying to build the tools to do all of that stuff, and those developer tools are extremely important. NLP has expanded into computer vision and other domains. Developers report that they're using transformers more and more. There's really a shift in that respect. You can see detailed numbers in there, of course. What I see out there is that pretty quickly, transformers are eating deep learning. A whole new generation of models from BERT to GPT-2 to GPT-3 to our very own Bloom model and the BigScience project are increasingly efficient for natural language processing, and the same is happening in computer vision and audio and speech. These models are in production today. They've broken out of the lab. Our good friend Elon is removing legacy code and replacing it with autonomous driving. Pretty much all the voice stuff out there, like Echo devices, are using these models. The big hyperscale players, web companies like Pinterest for recommendation, and JPMorgan and other financial services companies are also using them. It's real. When it comes to Hugging Face, we're an open-source company. We also have some commercial products, but the main flagship project at Hugging Face is the open-source project called the Transformers library. It's a Python library that makes it super easy to work with all those new models. You can use them with one line of code, and it's super developer-friendly. The adoption of the Transformers project on GitHub, looking at GitHub stars, is amazing. We're growing faster than PyTorch or Keras, which are amazing projects. Even crazier, we grow faster than Kubernetes, despite not having the same marketing budget as Google. The adoption from the open-source community is super strong. We have over 100,000 users on the hub every day, and we host our models and datasets. Feel free to sign up. We have over 1 million models used every day. If you're trying to solve a problem, are you working with text? Are you working with speech? And then what's the exact thing you're trying to do? You can go to the Hugging Face hub and browse all the models we have there. Within minutes, you can identify a short list of models that have been pre-trained on a similar problem. You can start experimenting even if you have no data; you don't need to build a dataset. Whatever test data you have, you can start using it. Translation models typically work pretty nicely, and you can just use them as is. Sometimes you do need to train a little more, so you need to prepare a much smaller dataset than if you were training from scratch. You're not going to spend a ton of time on that fine-tuning job, but you can achieve good accuracy for your domain-specific data with very little code. We're working on using transformers for tabular data, but I would never recommend that over XGBoost or CatBoost. It's worth keeping an eye on, particularly since they show promise. You could do initial training on that and then fine-tune on just a little bit of labeled data. For the sake of time, I'll leave you to check out the paper. Here's the Hugging Face hub that I mentioned. This slide is never up to date because the number of models changes every minute. We have a lot of different models for natural language processing, audio, multimodal, a few tabular models, and some reinforcement learning models. If you're looking for a particular type of model, you can explore and quickly find five to ten models that might work for you. Here's an example of using a Hugging Face model. I'm doing zero-shot classification on a bit of text. The transformers library and its pipeline object create a pipeline for zero-shot classification using a Facebook model. It's an interesting technique. You can go and train, but only if you really need it. The more models we can train within a business day, the more important it is to iterate on those models to get to the right accuracy. We're working with a lot of companies like Graphcore, which is part of Intel. We have a dedicated library for that called Optimum. Finally, putting everything together, this is the family picture. We have datasets that you can start from, already in the right format, with no messing around. You just load and predict or train on it. You can feed that data into one of our training options. We have a library called Accelerate that takes it down one level if you want full control of the training loop. We have a no-code training option called AutoTrain, which started with NLP using non-transformer models. I still have to try it, but it's useful for showing business stakeholders. It's not very sexy, but it helps them understand what you're doing. The model, no one wants to do that. That's why we build Spaces, a very simple way to demo and showcase your models using either the Gradio framework, which is part of Hugging Face, or the Streamlit framework. If you've been playing with text-to-image models like DALL-E Mini and Stable Diffusion, you've likely done that on Hugging Face Spaces recently. When it comes to deploying models, you can use Optimum to accelerate inference and deploy anywhere you like, on your own machine, in a container, etc. Or you can use the Inference API, which is our managed solution where you just load models. We have partnerships with AWS on SageMaker, where we're a framework, so just bring your code. We have built-in containers for Hugging Face. You can also use the Hugging Face endpoints, available on the Azure Marketplace. The key message is trying to make machine learning simpler and boring so that everyone can use it with no drama and no fuss and get the job done and move on to the next project. We need to avoid making ML complicated just for the sake of complicating it. Always keep an eye on the business goals and KPIs. If we don't have that, we don't have a project. Use transfer learning; don't go and build your PyTorch model from scratch. Most companies start from existing architectures and fine-tune on their own data. Please avoid reinventing the wheel for your own sake. Avoid building your own machine learning platform. Some companies may need to do that, but for everyone else, everything you need is probably out there. Just pick the ones you like, pick the ones that work best for you, and use them. Focus on the business problem, not the plumbing. If you're in a larger team, work hard on collaboration, exchanging models, and sharing knowledge. Tools like the Hugging Face Hub let you put all your models and datasets in the same place, making it easier to collaborate. If you have a few minutes for questions, let's do questions. I'm doing a workshop tomorrow where we'll be running code. Not sure if we have seats left, but if you're interested, go and check that out. Thank you very much.

Tags

Deep LearningTransformersMachine Learning WorkflowTransfer LearningHugging Face