ML Hyperproductivity with Hugging Face part 1 AutoML with AutoNLP
October 11, 2021
In this video, I start from a movie review dataset and build sentiment analysis models with AutoNLP, an AutoML product designed by Hugging Face. I then predict with the best model on the Hugging Face website and in a Google Colab notebook.
In part 2 (https://youtu.be/1JVm8yIbEqU), you'll see how to quickly build a small web app to test and demo your model, and how to host it on Hugging Face.
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️
https://huggingface.co/autonlphttps://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
New to Transformers? Check out the Hugging Face course at https://huggingface.co/course
Transcript
Hi everybody, this is Julien from Arcee. In this first Arcee video, I would like to show you how you can be hyperproductive building, deploying, and testing NLP models using Arcee tools. In fact, we're going to be writing no code at all for the training part and maybe only 10 lines of code to deploy a web app. This is really an easy way to go from your dataset to a model that you can showcase in a web app.
So, what are we going to use for this? We're going to use two Arcee tools. We're going to use AutoNLP, which, as you can guess, is an AutoML tool for transforming models. We'll just bring our data, and AutoNLP will automatically fine-tune the best models for this dataset and the task type that we select, without us writing a line of code or managing any infrastructure. So, this is a really cool one. The other tool I'm going to use is called Spaces. Spaces allows us to write just a few lines of code and deploy a test app that can showcase your model for quick demos, POCs, and more. You can use either Streamlit or Gradio to do this. I promise it's very little code, and as everybody knows, I'm an awful UI developer, so if I can do it, trust me, anyone can.
Okay, let's get to work and first talk about the dataset and the problem we want to solve. In this example, I'm going to use the IMDB dataset, which contains positive and negative movie reviews. We're going to use this dataset to train a sentiment analysis model, a really popular use case. Lots of companies want to build text classification models, and you could be using your tweets, emails, or any text you'd like to train the model.
I downloaded this dataset from Kaggle. Let me zoom in a bit. It's not huge, and it's already split into test, train, and validation sets. It's a CSV dataset, so it's really what you would expect. Two columns: the actual movie review, which is text, and the label. Zero means a negative movie review, and one means a positive movie review.
Now that we have a dataset, let's see how we can build models. First, you should log in to the Arcee website, go to huggingface.co.autoNLP, join the AutoNLP beta, and then click on connect to your dashboard. Let's create a new project. We'll call this one IMDB MoHF. Task types include text classification binary, text classification multiclass, token classification, Q&A, summarization, or text scoring. For sentiment analysis with positive and negative sentiment, it's a binary text classification model. We could try and pick our own models from the Hugging Face hub, but let's trust AutoNLP to do the right thing and let it pick our models automatically.
The language is, of course, the language of the dataset. These reviews are in English, but you could use quite a few different languages, and AutoNLP would pick models that have been pre-trained on those languages. Finally, how many models do we want to train? We can go from 5 to 100. I'm pretty sure Ludicrous is a Tesla reference, so let's stick with 15 models. That's all the information we need to provide, really. We can create the project.
The only missing bit is passing the data. Let's try and select train and validation. The train CSV is the training set, and the text column is the one the models expect. This dataset happens to have the right names for columns, but they could be called review and sentiment. This way, we can easily map the column names in the dataset to the actual feature names for training. Let's add this to AutoNLP and do the same for the validation dataset. If we had a single file, we could just give a big file to AutoNLP, and it would automatically split it for training and validation, but here we can pass those two chunks. It's all good, so we can go to trainings.
We get an estimate of cost, and as you can see, the ballpark estimate is about $10 per model. You can just launch training, and if you need to, you can go back and change the number of jobs. So, let's start training. It's going to process the data, tokenize, etc. We see our 15 jobs being queued and starting to train. They have funny auto-generated names, and once they start running and publishing metrics, we'll see those metrics here.
Let's pause the video, wait for those jobs to start doing something useful, and we'll be right back. After a few minutes, some jobs have been stopped because they're not promising, and you're not going to get billed for those. We'll keep focusing on the other ones. Let's wait a few more minutes, and we should see metrics. Now models are starting to report metrics, and we can see the top model so far, which has a little gold star. If we go to metrics, we'll see the leaderboard with loss, accuracy, precision, recall, AUC, and F1. You can sort them depending on which metric is the most important to you.
Let's wait until this completes and see where we land and what the top model is. Then we'll see how we can predict with and deploy it in a web app in a few lines of code. We're now about 42 minutes into the training job, and we can see some jobs are complete and some are still going. For now, the best job has reached 94.08 accuracy, which is pretty good. Let's see if the other ones can catch up. The sardine is leading, but the rhino could be catching up. Let's wait a little more until all those jobs complete, and then we'll call the winner.
After about 2 hours and 16 minutes, training is complete. We can see the leaderboard for our models, and the sardine is the winner. Rhino didn't quite catch up. Once again, we can see the metrics here. We want to know more about this top model, so you can just click on it and view this model on the Hugging Face hub because all models are automatically pushed there. You can go and inspect all those models right there. The top one is this one, and we have tags saying it's a Roberta model fine-tuned on English. We can quickly test it right there, just like any other model. Let's go and try that. I'll use my favorite worst movie of all times to test it. The model is loaded, and we'll see a prediction for this. I think this is a negative review, so it should have a very high probability for zero. Let's see how we do here. Yes, very negative review, no surprise.
We can quickly test it like this. If we want to deploy it somewhere else, we could use inference on the Hugging Face platform or deploy it on SageMaker. We'll do this in another video. What else can we do? We can see the files, so we see the model, the tokenizer, the model config, and exactly which architecture was used. We can see hyperparameters, and we could start from there manually and try to squeeze a little more accuracy.
Let's try and use the model. By default, all these models are private, so I'm going to make this one public and switch to a notebook here on Colab. I just installed transformers and can use this model in a Hugging Face pipeline, one line of code, and I can predict with it. If I want to work in a notebook, this is super easy. We can clone the repo, grab the model name, and clone it too. Plenty of ways to use the model.
I'm going to pause the video here. In the next video, I'm going to show you how we can very easily deploy this using Spaces, building a small web app to test the model. Stay tuned and go and check out part two for ML hyperproductivity with Arcee. See you soon.
Julien Simon is the Chief Evangelist at Arcee AI
, specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.