Vision Transformer on SageMaker part 2 training with Hugging Face

November 25, 2021
This video is the second in a series of three, where I focus on training a Vision Transformer model with Amazon SageMaker and the Hugging Face Deep Learning Container. In this video, I start from the image classification dataset that I prepared in the first video (https://youtu.be/jalopOoBL5M). Then, I download a pre-trained Vision Transformer from the Hugging Face hub, and I fine-tune it on my dataset, using a training script based on the Trainer API in the Transformers library. ⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️ Code: https://github.com/juliensimon/huggingface-demos/tree/main/vision-transformer Original training code by Philipp Schmid: https://github.com/huggingface/notebooks/blob/master/sagemaker/09_image_classification_vision_transformer/scripts/train.py More Hugging Face on SageMaker notebooks: https://github.com/huggingface/notebooks/tree/master/sagemaker New to Transformers? Check out the Hugging Face course at https://huggingface.co/course

Transcript

Hi everybody, this is Julien from Hugging Face. This is the second of a series of three videos where I focus on training a Vision Transformer model on Amazon SageMaker. In the first video, I showed you how to build a dataset from images stored in S3 using a SageMaker processing job that fetches the images, pre-processes them, extracts the features, and saves everything as Hugging Face datasets to S3. Now, picking up from there, we're going to train using SageMaker and a training script written with the Transformers library and the Trainer API, and we're going to run that script inside the Hugging Face container on SageMaker. Okay, so let's get started. First, of course, I installed the SageMaker SDK. Make sure you have the latest version so that you can use the latest containers as well. The first step is to define the location of the datasets that we processed with SageMaker processing. We have a training set, a validation set, and a test set. These are the three inputs to our training job. I have some hyperparameters in my training script, and there are more. We'll take a look in a minute. Here, we're going to train for three epochs, and we want to train with this model. I could use a different vision transformer model if we wanted. This is my training script, and again, we'll take a look at it in a minute. The rest is really SageMaker as usual. We import the HuggingFace estimator. We pass the location of the script, the hyperparameters, which version of the Transformers library we want to use, which PyTorch version we want to use, which Python version we want to use, and I believe these are the latest values or the most recent versions at the time of recording. We also specify the infrastructure you want to use for training. Here, I'm going to use a cost-efficient G4DN2XL instance. It has a single GPU, but as we're fine-tuning, it's more than enough. Then I just call fit, passing the three datasets that we processed in the previous video. The training code I'm using here is actually a ready-made example that's available in one of the Hugging Face repos, and Philip Schmidt wrote that, so thanks, Philip. I made very minor changes, basically renaming hyperparameters so that they would be exactly the same as my PyTorch Lightning example, which is the third video. I also added support for the test dataset. But generally, there are almost no changes here. If you're not familiar with the example, here's what we do: This will run in the Hugging Face container on SageMaker. As usual, we use script mode to pass hyperparameters and the location of the different datasets that we are going to load. We parse the arguments, load the datasets, and these are Hugging Face datasets, so we can use `load_from_disk`. We set up our metric, download the model itself, and change the labels so that they match the labels in the dataset, in this case, dog and cat. We set up the training arguments in the Trainer API, including batch sizes, and so on. We create a Trainer object, passing the model, the training arguments, the metrics, the training set, the validation set, and we start training. Once training is complete, we run evaluation using the test dataset. Finally, we write down the results to a .txt file that will be part of the model artifact, and we save the model as well. I made very few changes to this, and it saved me a lot of time. That's really why I also wanted to build my own dataset as a Hugging Face dataset because any script that uses dataset APIs like `load_from_disk` and so on is just going to work. This is a really good way to standardize your training jobs using those datasets. Let's take a look at the training log. There's a lot of stuff in there, but that's the SageMaker stuff. Now we get to the actual script. We see the script being invoked, three epochs with that model, and off it goes. It trains, downloads the model first, initializes weights, and then it starts training. It trains and trains. Let's get to the end of that. We get to 98% accuracy, and I guess that's okay. I didn't tweak anything. We save the model and then evaluate. That's it, right? Pretty cool. This lasted 918 seconds, which is about 15 minutes. I didn't use spot instances here, but you can do that to save a bit of money. Once the model has been uploaded to S3, you can find its location here. You can copy it to your local environment and extract it. It's a PyTorch model, and you see the checkpoints and the evaluation results, etc. The next step would be to deploy, but at the moment, the Hugging Face container doesn't support deploying image classification tasks. Summing things up, it's super simple. The Trainer API and the Dataset API just make it very simple to train on SageMaker. This Trainer code here is just vanilla code. Whatever runs on your laptop, you can just move here. Using script mode, you can run it inside SageMaker. If you want more examples, we have plenty in our repo, and I'll put a link in the video description. That's the end of the second video, and in the third video, I'll show you another training script where, instead of using the Trainer API, we use PyTorch Lightning to train the same model on the same dataset. Keep watching.

Tags

Vision TransformerAmazon SageMakerHugging FaceModel TrainingSageMaker Processing

About the Author

Julien Simon is the Chief Evangelist at Arcee AI , specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.

With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.

Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.

Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.