Hello, my name is Julien. I'm a tech evangelist with AWS and welcome to this session for the Innovate online conference. In this session, we're going to talk about machine learning. Surprise, surprise, that's pretty much what I do. So we're going to focus on Amazon SageMaker and what I'm going to show you today is how to bring existing deep learning code all the way from your laptop to Amazon SageMaker. And I will use a deep learning library called Keras, a popular one, so maybe that's what you're using. If you're using something else, if you're using TensorFlow directly or Apache MXNet or PyTorch, etc., stay with me because everything I'm going to show you works exactly the same with the other libraries. So let's get started.
When we start working on deep learning models, most of us start working on our laptops, right? And we write some code. Here's a piece of code that we're going to work with today. It's using the Keras library and it's a simple image classifier for an image dataset called Fashion MNIST. Let me quickly show you what Fashion MNIST looks like. This is the dataset we're going to work with. Fashion MNIST is a drop-in replacement for the very well-known MNIST dataset with handwritten digits from 0 to 9, black and white pictures, etc. Fashion MNIST was designed by a company called Zalando. Thanks guys for building it. It is a drop-in replacement, so it has the same number of classes (10), the same number of images (60,000 for training, 10,000 for validation), and the same image size (28 by 28 pixels, black and white pictures). The interesting thing about this dataset is that it is much more challenging to train on compared to simple digits. As you can see, we have clothing, shoes, handbags, etc. So it is a more difficult dataset to train with, making it more fun.
What we're trying to do here is train a deep learning model that will successfully classify those images or similar images into the right category, one of those ten categories (shoes, t-shirts, etc.). You would start with something like this. This is not a Keras tutorial, but let's take a look at this code and, of course, we're going to run it. First, we define some hyperparameters. Initially, as we experiment, we would define a bunch of variables for some machine learning parameters like the number of epochs, learning rate, batch size, and some directories (modeldir, where to save the model, trainingdir, where the training data is, validationdir, where the validation data is). Hopefully, you are not hardcoding these things; at least you have variables for that. If not, that's okay at this stage.
Then we load the dataset. I've already downloaded this dataset, and it's in NumPy format, so the images are in NumPy format, compressed. Using `numpy.load`, I can load the training and validation images and labels. These end up in NumPy arrays, which are 28 pixels by 28 pixels. Although I'm using Keras, I have a choice of backend. Keras is a high-level API, very easy to learn, and it sits on top of a backend used for training and prediction. Originally, you could use Theano and TensorFlow, but we also added support for MXNet. Here, I'm going to use TensorFlow, but you could do the exact same thing if you used Keras with MXNet as a backend. I need to make sure that my images are in the right format. TensorFlow expects channels last, meaning the training and prediction data should have the shape (batch size, image width, image height, number of channels). Here, we're working with black and white images, so it's only one channel, but if we had color images, we would have three channels. MXNet needs channels first, so it would be (batch size, channels, width, height). Depending on the backend, we need to ensure our dataset has the right format. So, we make sure Keras is not configured for channels first, and if it is, we reshape our dataset to (batch, width, height, channels).
We print the shape to make sure it's fine and print the number of samples. There's not a lot of data processing to be done here. We normalize the pixel values, which are between 0 and 255, to values between 0 and 1 by dividing by 255. This is a standard machine learning practice to avoid training issues. We have 10 classes, and the training labels and validation labels are still integers (0 to 9). For machine learning, we need to convert these to one-hot encoding, which means having as many dimensions as possible. Here, if we have 10 classes, we need 10 dimensions, and nine of them will be zero, with one flagging the actual class. Keras has a simple API to do this, converting the integer class numbers to bit vectors with all bits set to 0 except one flagging the actual class.
Next, I start designing my model. It's a sequential model with a first block for convolution and pooling, a second block for convolution and pooling, and a third block with a fully connected layer (dense layer) to classify the images processed by the convolution blocks. I add some dropouts to regularize and fight overfitting. The output layer uses softmax because I want to output probabilities. Softmax ensures the outputs add up to one, making them probabilities. We print the model summary and compile the model using the SGD optimizer for the training process. Then, I train the model using the data I loaded for the right number of epochs, score the accuracy against the validation set, display those metrics, and save the model in Keras format. This is pretty much a "hello world" for image classification with Keras. All the code will be on GitLab, and you'll get the link, so you can try it yourself.
Let's run this code. For convenience, I'm running it on the notebook instance because this notebook instance has a GPU, which I don't have on my local machine, making training faster. Here's the exact same code, a simple Keras example. I'm just going to run it with Python to see how it works. We see the shape of the training set, the number of samples, the model summary, and it trains very fast because I have a GPU and trained for one epoch. I just want to check for correctness, not high performance. It took five seconds, so the code works fine. It works on any machine with TensorFlow and Keras.
Now, let's make a few improvements. The first thing we want to fix is to use command line arguments for the parameters (epochs, learning rate, batch size, etc.). The second thing we need to change is saving the model in TensorFlow Serving format. Keras format is fine, but our goal is to run this on SageMaker, which uses TensorFlow Serving. TensorFlow Serving is a high-performance model server part of the TensorFlow suite. We need to change the format so TensorFlow Serving can load it. These are minor tweaks. Instead of hardcoding values, I use argparse to grab command line arguments and set default values. In the end, instead of saving the model to a Keras model, I save it to a TensorFlow Serving format. There's a simple API to do this, specifying the model inputs and outputs. You can copy-paste this code for your own examples.
Let's check it out. Now we want to run version 2. I'm using default values, so it runs for a little longer. It's quite fast because I have a GPU on this machine. Let's wait for the last epoch to complete. Here we go. We get better accuracy this time, 92.3%. Let's train for five epochs to see if the parameters work. It says epoch one of five, so it looks fine. I can use argparse, which is good. This is a reasonable first step, even if you're still working locally. You have more flexibility in training with different values and can script it. This is an important step in preparing for the move to SageMaker because some of these parameters will be used by SageMaker.
SageMaker is based on Docker containers, so we have a TensorFlow container, an MXNet container, and so on. We load and run the training and prediction scripts inside the container. The interface between your code and SageMaker is simple: SageMaker invokes your code, passing some parameters (where to save the model, where to load the training data, etc.), and you set some hyperparameters for the training job. This is the input interface between SageMaker and your code. Post-training, we just need to save the model in TensorFlow Serving format, which is what SageMaker requires.
The last thing to understand is how SageMaker invokes the code inside the TensorFlow container. It invokes it pretty much like we did in the console, except it runs inside a container. SageMaker passes some hyperparameters and the location of the training and validation sets and where to save the model. These are passed as environment variables. When SageMaker invokes your code, it passes hyperparameters as command line arguments and those four parameters (GPU count, model directory, training directory, validation directory) as environment variables. You need to grab them as environment variables and process them later. This is the only trick to run your code on SageMaker. If you change those four lines, you can run this code on SageMaker without any change.
Now we're using SageMaker. I'm still on the notebook instance with a Jupyter notebook. The first thing I do is import the SageMaker SDK. If you want to install it on your local machine, you can do something like this. If you have virtual environments or conda environments, you know how to do that. You need Keras, TensorFlow, SageMaker, Pandas, a specific version of requests that SageMaker needs, and maybe a few more things depending on your Python environment. I grab the Fashion MNIST dataset using a simple API in Keras and create a data directory on my notebook instance for the training and validation sets. I need to upload it to S3 because SageMaker needs your data to be in S3. I have a training channel and a validation channel.
Now, I want to see if this code runs inside the container. SageMaker has a cool feature called local mode where, instead of creating managed infrastructure, we can train on the local machine. Here, it means training on the notebook instance. If you're running this on your laptop, it means pulling the TensorFlow container from SageMaker and training locally on your machine with that container. This is the first step in validating that your code works fine inside SageMaker without spinning up managed infrastructure, which is nice for experimentation. Let's run it. I configure this and just run it. I'm not creating managed infrastructure; I'm pulling the TensorFlow container to the local machine and injecting our script. I can use local mode by setting the instance type to either local (CPU) or local_gpu (GPU). I need to make sure I use script mode and can pass some hyperparameters if I want to.
It started immediately. Let's look at how SageMaker invokes the code inside the container. It passes the hyperparameters, model directory, and other variables. The number of GPUs is one, and the training and validation sets are at local paths inside the container. SageMaker copies your training and validation data from S3 to these local locations in the container. Once the model is saved, it's saved locally inside the container, and then SageMaker copies it to S3. The rest is pretty much identical. We train for one epoch, and it works fine. Now we know this code trains fine on SageMaker.
Next, we want to train it on a real instance. Initially, you might work with a fraction of your dataset for experimentation, but then you want to train at scale on managed infrastructure. The only thing you need to change is the instance type. Here, we said train on the local GPU, but we change that line to create an ML.p3.2xlarge instance, a fully managed instance with a GPU. We could have several instances for distributed training, but this is a tiny dataset, so no need for that. We can pass other parameters and configure the estimator to train. It takes a few minutes, so I'll speed up the process and be back when the training is complete.
Training is complete. It took four or five minutes. This time, we created a managed instance, pulled the TensorFlow container, loaded our script, injected parameters, etc. This takes a few minutes, but you can do training at scale and distributed training. Local mode is nice for experimentation, but for production or large-scale training, you need managed infrastructure. Let's look at how the code was invoked. It's called the same way, but with the instance type changed. We paid for just under two minutes of the P3.2xlarge instance and got an accuracy of 92.5%. We could tweak that a little more.
Now, we'd like to see this model in action, so I'm going to deploy it. I could deploy it on a GPU instance, but I'm going to use Elastic Inference, which lets you attach fractional GPU acceleration to any EC2 instance. I'm deploying to an HTTPS endpoint hosted by a C5.large instance, accelerated with a medium-size accelerator. This combination gives us almost the same performance as a P2.xlarge instance but at an 80% discount. If you're deploying to GPU instances today, please take a look at Elastic Inference, run some benchmarks, and you might realize significant cost savings.
Let's run this. It will take a few minutes, so I'll speed it up. Once the endpoint is live, we can try predicting with the model. After a few minutes, our model is deployed, and we have a SageMaker endpoint with an HTTPS URL to post data to for predictions. We can do this in any language or using tools like curl or Postman, but the SageMaker SDK has a nice predict function. I'm grabbing five random samples from the validation dataset, normalizing the pixel values, displaying the pictures, and calling the predict API to send those five images to my endpoint. I'll display the predicted labels and compare them to the real labels.
The first invocation is a little slower because it's loading the model, but it should be faster the next time. Predicted labels: 0, 1, 9, 7, 3. These are the real labels: 0, 1, 9, 7, 3. We did a good job. Let's try one more: 8, 2, 7, 4, 7. Yes, we have a mistake now. These two dresses are apparently of a different class, which is what I told you before. Fashion MNIST is more challenging to train on than MNIST, making the models work a little harder. We're posting those images to the URL and receiving predictions. If we're done, we can clean up and take the endpoint down.
Quick recap: We took a vanilla Keras training script, moved to command line arguments, saved the model in TensorFlow Serving format, integrated with SageMaker by reading values from environment variables, and made sure we saved the model in the right place. We trained in local mode for experimentation and early tweaking, then trained on a GPU instance by changing the instance type. We deployed the model and predicted with it. If you're curious about more advanced topics like automatic model tuning, yes, this could also work. I'm writing a blog post on this, so stay tuned. If you're curious about script mode, this is the URL to go to. We'll put all the URLs and the link to my code in the notes for the session. This is what you want to look for: sagemaker.readthedocs.io, which is the documentation for the SageMaker SDK, and there is a specific section on script mode and the same for MXNet and other libraries. This is what to look for if you want to get started. I hope you learned something today. Keep an eye out for those blog posts. You can follow me on Twitter @JulienSimon and on Medium. If you have questions, ping me on Twitter or LinkedIn. Happy to help you out. Now it's your turn, so go build some cool stuff and share it with me. I'm happy to share it with the community. Have a great day. Bye-bye.