Published: 2019-06-21 | Originally published at AWS Blog
Keras is a popular and well-documented open source library for deep learning, while Amazon SageMaker provides you with easy tools to train and optimize machine learning models. Until now, you had to build a custom container to use both, but Keras is now part of the built-in TensorFlow environments for TensorFlow and Apache MXNet. Not only does this simplify the development process, it also allows you to use standard Amazon SageMaker features like script mode or automatic model tuning .
Keras’s excellent documentation, numerous examples, and active community make it a great choice for beginners and experienced practitioners alike. The library provides a high-level API that makes it easy to build all kind of deep learning architectures, with the option to use different backends for training and prediction: TensorFlow , Apache MXNet , and Theano.
In this post, I show you how to train and deploy Keras 2.x models on Amazon SageMaker, using the built-in TensorFlow environments for TensorFlow and Apache MXNet. In the process, you also learn the following:
This example demonstrates training a simple convolutional neural network on the Fashion MNIST dataset. This dataset replaces the well-known MNIST dataset. It has the same number of classes (10), samples (60,000 for training, 10,000 for validation), and image properties (28×28 pixels, black and white). But it’s also much harder to learn, which makes for a more interesting challenge.
First, set up TensorFlow as your Keras backend (and switch to Apache MXNet later on). For more information, see the mnist_keras_tf_local.py script.
The process is straightforward:
Positioning your image channels can be tricky. Black and white images have a single channel (black), while color images have three channels (red, green, and blue). The library expects data to have a well-defined shape when training a model, describing the batch size, the height and width of images, and the number of channels. TensorFlow specifically requires the input shape formatted as (batch size, width, height, channels ), with channels last. Meanwhile, MXNet expects (batch size, channels, width, height ), with channels first. To avoid training issues created by using the wrong shape, I add a few lines of code to identify the active setting and reshape the dataset to compensate.
Now check that this code works by running it on a local machine, without using Amazon SageMaker.
You must make a few minimal changes, but script mode does most of the work for you. Before invoking your code inside the TensorFlow environment, Amazon SageMaker sets four environment variables
You can use these values in your training code with just a simple modification:
What about hyperparameters? No work needed there. Amazon SageMaker passes them as command line arguments to your code.
For more information, see the updated script, mnist_keras_tf.py .
After deploying your Keras model, you can begin training on Amazon SageMaker. For more information, see the Fashion MNIST-SageMaker.ipynb notebook.
The process is straightforward:
In the training log, you can see how Amazon SageMaker sets the environment variables and how it invokes the script with the three hyper parameters defined in the estimator:
Because you saved your model in TensorFlow Serving format, Amazon SageMaker can deploy it just like any other TensorFlow model by calling the deploy () API on the estimator. Finally, you can grab some random images from the dataset and predict them with the model you just deployed.
Script mode makes it easy to train and deploy existing TensorFlow code on Amazon SageMaker. Just grab those environment variables, add command line arguments for your hyperparameters, save the model in the right place, and voilà!
As mentioned earlier, Keras also supports MXNet as a backend. Many customers find that it trains faster than TensorFlow, so you may want to give it a shot.
Everything discussed above still applies (script mode, etc.). You only make two changes:
For more information, see the mnist_keras_mxnet.py training code for MXNet.
You can find the Amazon SageMaker steps in the notebook. Apache MXNet uses virtually the same process I just reviewed, aside from using the MXNet estimator.
Automatic model tuning is a technique that helps you find the optimal hyperparameters for your training job, that is, the hyperparameters that maximize validation accuracy.
You have access to this feature by default because you’re using the built-in estimators for TensorFlow and MXNet. For the sake of brevity, I only show you how to use it with Keras-TensorFlow, but the process is identical for Keras-MXNet.
First, define the hyperparameters you’d like to tune, and their ranges. How about all of them? Thanks to script mode, your parameters are passed as command line arguments, allowing you to tune anything.
When configuring automatic model tuning, define which metric to optimize on. Amazon SageMaker supports predefined metrics that it can read automatically from the training log for built-in algorithms (XGBoost, etc.) and frameworks (TensorFlow, MXNet, etc.). That’s not the case for Keras. Instead, you must tell Amazon SageMaker how to grab your metric from the log with a simple regular expression:
Then, you define your tuning job, run it, and deploy the best model. No difference here.
Advanced users may insist on using early stopping to avoid overfitting, and they would be right. You can implement this in Keras using a built-in callback ( keras.callbacks.EarlyStopping ). However, this also creates difficulty in automatic model tuning.
You need Amazon SageMaker to grab the metric for the best epoch, not the last epoch. To overcome this, define a custom callback to log the best validation accuracy. Modify the regular expression accordingly so that Amazon SageMaker can find it in the training log.
For more information, see the 02-fashion-mnist notebook.
I covered a lot of ground in this post. You now know how to:
Thank you very much for reading. I hope this was useful. I always appreciate comments and feedback, either here or more directly on Twitter .
Julien is the Artificial Intelligence & Machine Learning Evangelist for EMEA . He focuses on helping developers and enterprises bring their ideas to life. In his spare time, he reads the works of JRR Tolkien again and again.