Machine Learning 2.0 with Hugging Face Julien Simon Conf42 Machine Learning 2022
June 21, 2022
As amazing as state-of-the-art machine learning models are, training, optimizing, and deploying them remains a challenging endeavor that requires a significant amount of time, resources, and skills, all the more when different languages are involved. Unfortunately, this complexity prevents most organizations from using these models effectively, if at all. Instead, wouldn’t it be great if we could just start from pre-trained versions and put them to work immediately?
This is the exact challenge that Hugging Face is tackling. Founded in 2016, this startup based in New York and Paris makes it easy to add state-of-the-art Transformer models to your applications. Thanks to popular open-source libraries (transformers, tokenizers, and datasets libraries, developers can easily work with over 2,900 datasets and over 29,000 pre-trained models in 160+ languages. In fact, with close to 60,000 stars on GitHub and 1 million downloads per month, the transformers library has become the de-facto place for developers and data scientists to find state-of-the-art models for natural language processing, computer vision, and audio.
In this session, we'll introduce you to Transformer models and what business problems you can solve with them. Then, we'll show you how you can simplify and accelerate your machine learning projects end-to-end: experimenting, training, optimizing, and deploying. Along the way, we'll run some demos to keep things concrete and exciting!
Other talks at this conference 🚀🪐 https://www.conf42.com/ml2022
—
0:00 Intro
0:22 Talk
Transcript
Hi everybody, my name is Julien and I'm here to introduce you to building natural language processing applications with transformers. A few years ago, deep learning exploded onto the scene, driven by a confluence of factors. The first was the resurgence of neural networks, an old technology brought back and applied to computer vision, natural language processing, and generally working with unstructured data, proving to be very efficient.
What made it possible for companies to use deep learning was the availability of open datasets. Deep learning is very data-hungry, and having freely available datasets like ImageNet for computer vision was a significant boost. Compute is critical in deep learning, and GPUs became available and applicable for tasks beyond 3D gaming. They also became accessible on the cloud, making it easier to obtain the necessary compute power for deep learning problems.
A collection of open-source tools also became available, such as libraries like Theano, Torch, and later TensorFlow. While these tools were initially geared towards experts, they represented a significant step forward. However, building and training models was still complex, and developers without a machine learning background found it challenging.
A few years later, a typical machine learning and deep learning project looks like a waterfall process, with a significant portion of time spent on data preparation and cleaning, often 50% to 80% of the project. Training and evaluating models, managing infrastructure, and deploying models in production, which is the hardest part, are also time-consuming. Unfortunately, over 80% of data science projects do not make it into production, which is a significant issue. A proof of concept (POC) is nice, but business value comes from deploying in production, and very few companies manage to achieve this.
Something different is needed, and this is what I call Deep Learning 2.0. The technology has evolved, with neural network architectures like CNNs and LSTMs being replaced by transformers. BERT, released by Google in 2017, marked the birth of transformers. Transformers are evolving, and practitioners now rely more on transfer learning, which involves starting from pre-trained models and applying the knowledge they have learned to specific business problems, often with minimal retraining.
GPUs are still around, but specialized machine learning hardware is emerging, offering significant benefits. Tools have become more user-friendly, making it easier for developers to train and deploy models without deep machine learning expertise. The learning curve is much flatter now.
Let's look at these four key developments. Transformers are a new model architecture and an open-source library stewarded by Hugging Face, with the help of the community. The Hugging Face Transformers library is one of the fastest-growing open-source projects in history, as evidenced by the GitHub stars. It's growing faster than popular projects like Kubernetes, Node, or PyTorch. We see significant adoption from the community and recognition from analysts and the IT community.
Transformers are not just for NLP; they are expanding into computer vision, speech, audio, and reinforcement learning. The Kaggle report shows that traditional deep learning architectures like RNNs and CNNs are becoming less popular, while transformers are gaining traction. This indicates that transformers are becoming the standard for many machine learning problems.
On the Hugging Face Hub, we see about 1 million model downloads every day, a number that is rising. Transfer learning means starting from a pre-trained model that matches your business problem, rather than building a large dataset from scratch. The Hugging Face Hub offers a variety of task types, including NLP, computer vision, and audio. If you find a model that matches your business problem, you can test it quickly and often it will work out of the box. For tasks like translation or sentiment analysis, this can be sufficient.
Sometimes, you will need to fine-tune the model on your data, which is simpler and faster than training from scratch. Fine-tuning requires less data, is faster to build and train, and is less expensive. With the Transformers library, you can fine-tune models with just a few lines of code.
Here's an example using the Hugging Face library. In one line of code, I can build a model for translation. For instance, translating from English to Hungarian takes just one line of code. I can then build a pipeline to classify tokens in the translation, using an off-the-shelf model. This shows the depth of models available on the Hugging Face Hub. In five lines of code, I can perform entity extraction with translation from English to Hungarian.
Let me show you a more complex demo. Using off-the-shelf models, I'll perform voice queries on financial documents. The first model is a speech-to-text model with built-in translation from Facebook. I'll record a sentence in French, which will be translated to English and used to run a semantic search on a document corpus of SEC filings and annual reports from large American companies.
Here's my app. I'll record a sentence in French, and the app will translate it to English and run the query. The speech will be turned into text, translated, and used to find the closest sentences in the document corpus. For example, if I ask, "Who's the CFO at Gap?" the app will return the top matching sentences from Gap's annual report.
This app is built with about 100 lines of code, with half dedicated to the user interface. I load the models, process the speech, and run the semantic search. The entire process is straightforward and requires no training.
The next development is machine learning hardware. While GPUs are still useful, specialized chips for training and inference are emerging from companies like Habana, Graphcore, Intel, Qualcomm, AWS, and others. Accelerating both training and inference is crucial. Faster training allows for quicker iteration and better model convergence, while faster inference is essential for low-latency applications. Hugging Face partners with these companies and provides a dedicated library called Optimum to make it easy to work with these chips.
The last development is developer-friendly tools. While experts are still needed for complex problems, developers can build many projects independently. The Hugging Face Hub offers over 4,000 datasets and over 40,000 models. You can test and fine-tune models using various methods, including Auto-Train, the Transformers library, and Optimum. Once you have a model, you can showcase it in Spaces and deploy it using the Inference API or on your infrastructure.
Hugging Face has a deep engineering partnership with AWS, making it easy to run and deploy Hugging Face code on Amazon SageMaker. I'll show you a quick example of fine-tuning a DistilBERT model on a product review dataset from the Hugging Face Hub. After installing dependencies and downloading the dataset, I simplify the problem by mapping sentiment to positive or negative reviews. I tokenize the text, upload the datasets to S3, and run the training script using SageMaker's script mode.
The training script is vanilla Transformers code, which can be run locally and then moved to SageMaker. I load the datasets from S3, set up the training arguments, and use the Trainer API to train the model. After training, I evaluate the model, save it, and deploy it using the SageMaker SDK. The entire process is straightforward and can be completed in a matter of weeks.
In summary, machine learning can be simplified by focusing on the right things. Find a pre-trained model that fits your task, identify a business KPI, and measure the model on real-life data. If fine-tuning is needed, it's not complicated. Pay attention to prediction latency and use available tools and infrastructure. You can be in production in a matter of weeks.
If you're new to Transformers, join our community at huggingface.co. Sign up for free, follow the Hugging Face course, and ask questions in the forums. For companies with strong business use cases, consider our expert acceleration program for end-to-end consulting. For those with privacy and security concerns, we can deploy the Hugging Face Hub privately on your infrastructure.
Thank you very much. If you have questions or need help with projects, contact me at this email address. You can find more content on Twitter, Medium, YouTube, and more. Hope this was useful, and thanks for listening. Have a great day!
Julien Simon is the Chief Evangelist at Arcee AI
, specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.