Hey, everybody. This is Julien from Arcee. In this video, I'd like to show you how to combine SageMaker to train machine learning models and Fargate to deploy on fully managed containers. It's a pretty cool combination. What I'm going to do here is start from a Keras script, a simple script I've used before. It's a simple CNN trained to classify images on the fashion MNIST dataset, which is a drop-in replacement for MNIST, with the same number of classes and samples. The goal is to classify those images correctly into 10 classes.
I'm using a script mode script, passing hyperparameters and dataset location as command line arguments, downloading the dataset, building the convolution blocks, training, and saving the model in TensorFlow serving format. This example is quite common, and although I'm using Keras, the process works with TensorFlow, PyTorch, and MXNet. It's a very generic way of training on SageMaker and deploying on Fargate.
First, I download the dataset and upload it to S3. It's already split into training and validation sets, so I see my two S3 locations. As usual, I configure a SageMaker estimator, using the TensorFlow one, passing the script location, training on one P32 XL instance (a GPU instance), and using TensorFlow 1.15 in Python 3 mode, in script mode. I pass the location of the training and validation sets, and this trains for a few minutes, achieving 92% accuracy. All good.
Now I have a trained model in S3, and I can see its location. I save this location to an environment variable and copy the model artifact to my local machine. I'm using a notebook instance, but you could use anything else. I extract the model artifact, which is indeed in TensorFlow serving format, to a directory called test models. This is one of my repos on GitLab, where I want to push the new model for deployment on my Fargate cluster.
Next, I add, commit, and push the model to the GitLab repo. Now the model is archived in GitLab, ready to be deployed. The next step is to create the Fargate cluster, which is simple. I call the Create Cluster API, give it a name, and that's it. Fargate is fully managed, unlike ECS and EKS, where you need to manage clusters and instances. I use the ECS CLI tool, which you can grab from GitHub, install, and copy to your path. I set the ECS CLI to work with the Fargate demo cluster in eu-west-1, which will be the default for future commands.
The cluster is created, but nothing is running yet. I use ECS CLI to check that no tasks are running. I need a container, and while I could build my own TensorFlow container, I'll use deep learning containers, which are pre-built for deep learning frameworks. We have containers for TensorFlow, PyTorch, MXNet, and different versions, available in many regions, with CPU and GPU versions. I trained on TensorFlow 1.15, so I pick the CPU version for inference in Python 3.6, as Fargate doesn't support GPU instances yet.
I find the container name in ECR, our Docker registry service, and update the region and version details. The container has everything I need. To deploy and load the model, I write a task definition. The image is the one I showed for inference in Python 3.6. The entry point involves creating a local directory, cloning my repo with the models, and firing up TensorFlow Serving using the specified ports and model version. I set memory and CPU allocations, open ports 8500 and 8501 for TensorFlow Serving, and configure CloudWatch logs for the container.
I register the task definition, which stores it in the Fargate backend. I call this API to create or update the task definition, and each update bumps the version number. I need both the name and version for the next steps. I run a task using the simple API, specifying the cluster name and task definition version. I pass a network configuration, launching the task in a subnet with a simple security group.
After a few seconds, the task is running, and the deep learning container has been pulled and started. I see the IP and port for the task. The final step is to build a URL for the TensorFlow Serving endpoint, using the IP, port, and the standard format. I grab random samples from the validation dataset, build a prediction request in JSON format, and post it to the URL using the requests library. The response is a JSON object with prediction vectors for each sample. We see 10 samples, each with 10 probabilities for the 10 classes, and the predictions are mostly correct, with one mistake, as expected given the 92% accuracy.
In the ECS console, I see the task running, and all the information is available. I hope you enjoyed this and learned a few things. I'll see you soon with another video. Bye-bye.