Vision Transformer on SageMaker part 3 training with PyTorch Lightning

November 25, 2021
This video is the third in a series of three, where I focus on training a Vision Transformer model with Amazon SageMaker and the Hugging Face Deep Learning Container. In this video, I start from the image classification dataset that I prepared in the first video (https://youtu.be/jalopOoBL5M). Then, I download a pre-trained base Vision Transformer from the Hugging Face hub, and I use PyTorch Lightning to append a classification layer to it. Finally, I train the model using the Trainer API in PyTorch Lightning. ⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️ Code: https://github.com/juliensimon/huggingface-demos/tree/main/vision-transformer Original training code by Niels Rogge: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/VisionTransformer/Fine_tuning_the_Vision_Transformer_on_CIFAR_10_with_PyTorch_Lightning.ipynb More Hugging Face on SageMaker notebooks: https://github.com/huggingface/notebooks/tree/master/sagemaker New to Transformers? Check out the Hugging Face course at https://huggingface.co/course

Transcript

Hi everybody, this is Julien from Arcee. This is the third of a series of three videos where I focus on training Vision Transformer models on Amazon SageMaker. In the first video, we focused on dataset preparation, and I showed you how you can load images directly from S3 to build a Hugging Face dataset that you can use for training. In the second video, we used this dataset to train a Vision Transformer model for image classification. We used the Trainer API in the Transformers library and the Hugging Face container on SageMaker to do that. In this video, the last of the series, we're going to train again, and we'll still use SageMaker and the Hugging Face container, but instead of using the Trainer API, we'll use PyTorch Lightning. I found a really cool example from one of my colleagues and I figured, hey, I haven't seen an example of that, so let's do one. So let's get started. The SageMaker part is absolutely identical to what we saw in the second video. The only difference is I am using a different script, but the STP definition is just the same. We pass the location of the script, the same hyperparameters, and I made sure the two training scripts use the same hyperparameters, transformer version, and in this case, we need to make sure that we use version 4.10 or higher. This shouldn't be a problem because there's a weird bug where you run into an error when you import transformers after PyTorch Lightning, but it's fixed. So, if you use 4.10 or higher, you're good. We use PyTorch 1.9, Python 3.8, and a cost-effective GPU instance. Then I code fit, passing the location of my training sets, validation set, and test set. For reference, if you didn't watch the second video, these come from a SageMaker processing job that I ran in the first video, building datasets from images stored in S3. So nothing weird. Let's look at the training code and then we'll take a look at the log. So here we go. I'm lazy, as everybody knows. In the second video, I used an existing notebook from Philip. Thanks again. In this one, I'm using code that my colleague Niels implemented. Thank you, Niels. There are lots of really good transformer tutorials at that location, so please go and check them out. What I did here is I took this example that runs in a notebook and adapted it to run on SageMaker. There was a little bit more work here. I implemented script mode so that this training code would interface with the Hugging Face deep running container. I had to install PyTorch Lightning in the container, and that's how I found this 4.10 or higher requirement on Transformers. And yeah, that's about it. These are the changes. But again, I've done this many times, and this really shows that you can take machine learning code that you find on GitHub or somewhere else, code that runs in a notebook in a local environment, and easily adapt it to run on SageMaker. The keyword you're looking for is script mode. If you think adapting code for SageMaker is complicated, well, it's not. It's actually the easiest thing. You need to look at this feature called script mode. We're going to cover this again. In the entry point of my training script, and this is what script mode is all about, I'm grabbing those hyperparameters as command line arguments, because this is really how that script will be invoked inside the SageMaker container: `python MyScript` and then command line arguments, hence the name script mode. I can also get the location of the different datasets and a few more things. Then the rest is your code. If you compare my code, or my SageMaker code, to Niels' local code, you'll see it's about the same. I'm just using command line arguments. So we load the datasets. Remember, these are Hugging Face datasets. So I can use `load_from_disk`. They're automatically copied inside the container by SageMaker. I need to make sure they're all in Torch format. And yeah, I build data loaders from those three datasets. Then I create a new model. I instantiate this model, which we'll take a look at in a second. This is the PyTorch Lightning part. I use the trainer object in PyTorch Lightning, setting very few parameters here, just epochs and how many GPUs I have. I fit to train and test to run the evaluation on the test set. Then I just save the model as a PyTorch Lightning model. So very simple process. And again, exactly what you would be doing in your notebook, except we're using command line arguments here that are passed by SageMaker. So let's take a look. If you look at PyTorch Lightning, it's not part of the Hugging Face container, so I have to install it. The implementation here is very interesting. And again, kudos to Niels for doing this. I just tweaked it, but he came up with it. What we do here is we actually start from a headless vision transformer model. So we download that and add a classification layer at the back. We add a head for classification using a dropout layer and a linear layer connecting the last layer in the transformer model to a fully connected layer with the right number of labels. So that's pretty interesting. This is really different from the examples we saw in the second video, where we downloaded a model for classification that was good to go and fine-tuned it. Here, we just grab the base model and add a classification layer. So it's a good example of customizing a model. And I guess that's why people like PyTorch Lightning. The rest is really pretty simple. We have the forward function. So first, we feed the pixel values. These are the pixel values extracted from the image by the feature extractor for that model, which is what we did already in the first video when we prepared the dataset. We use those pixel values to generate outputs, and then we feed that output through dropout and the classification layer. We have the training step function, the validation step function, and the test step function, which all use a common step function that receives a batch of images or pixel values, feeds them through the model, computes the loss function, and reports on predictions, accuracy, etc. So the only difference here compared to a generic loop is that we use the pixel values feature and the label feature, but the rest is very generic and you could reuse it with other models and other tasks. All right. What else can I say? That's about it. Configure the optimizer and return the three data loaders. So, pretty simple code, but pretty clever. I really like the fact that we start from the base model and add layers for classification. I think that's an interesting way if you want to customize models. Now let's take a look at the log. We see some verbose SageMaker stuff, the installation of PyTorch Lightning, and we see we loaded our three datasets, built the data loaders, and then we go and train. For three epochs, I believe. So, download the model, initialize the weights, and we train for a bit. And yeah, let's get to the end of that. Validation, this is the evaluation. Here we have almost 95%. Not as good, but I've run different examples, and sometimes it's a small test set, right? It's only 250 images. So there's a lot of variability here. I guess I would need a little more. But generally, you're going to get good results. We saved the model. And interestingly, this trained a little faster. I don't know if it's significant, but this is about almost 11 minutes. The previous example with the Trainer API was about 15. So I don't know. Maybe it's random. Maybe it's not. Go and figure it out. The model is uploaded to S3, and of course, we can copy it locally and extract the artifact. We see that PyTorch Lightning model, which we could extract and deploy in whatever way we want. Of course, we can't really use the Hugging Face container for that because it doesn't support PyTorch Lightning and it doesn't support image classification tasks for now. So just to recap, in the first video, we prepared the dataset. In the second video, we trained with the Trainer API and the Hugging Face library. And in this video, we trained using PyTorch Lightning. We saw how we could add layers on top of a base model. So some different ways to train. And I think the takeaway here is that it's pretty simple to train on SageMaker. It's pretty simple to scale your data preparation as well. I really like SageMaker processing. I use it a lot. I think it's a really cool way to do that. So I'll put all the links in the video description. Go and run those examples and start tweaking them. And of course, if you have questions, feel free to ask questions in the comments. Thanks for watching. I hope you learned a few things. And until next time, keep learning. Bye-bye.

Tags

Vision TransformerAmazon SageMakerPyTorch LightningHugging FaceModel Training

About the Author

Julien Simon is the Chief Evangelist at Arcee AI , specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.

With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.

Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.

Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.