Train with Amazon SageMaker on your local machine

Transcript

Hi, this is Julien from Arcee. In this video, I would like to show you how to call SageMaker APIs from a Jupyter notebook running on your local machine. This means you can work strictly on your local machine without using SageMaker Studio or notebook instances. That's a convenient way to use your local environment, whether it's Jupyter, your local IDE, or whatever you prefer. Let's get started. The first step is to ensure you have a local environment with all the dependencies installed. The best way to do this is to use Conda. I've tried using various setups on my Mac, including virtual environments, and I've always encountered dependency issues, which waste a lot of time. Conda is really the way to go. Here's how you set it up: First, install the Anaconda distribution. You can quickly check that it's set up correctly and that you're using the right Jupyter version. If you don't see that you're using the Anaconda Jupyter, check your Unix path or your path if you're using something else. Then, create a local Conda environment. Here, I'm creating an environment called `local-SM` with Python 3.7. I activate it and install the necessary packages: `pip`, `pandas`, `tensorflow`, and `Keras`, which I'll use in my notebook. You need to install the SageMaker SDK with `pip` because it's not part of Conda. Then, create your own Jupyter kernel by installing `IPyKernel` and registering your Conda environment as a Jupyter kernel with the following command. Here, I'm registering it as a `local-SM` kernel, but you can use another name if you prefer. Of course, start Jupyter and ensure Docker is running on your local machine because we will pull Docker containers from AWS. Let's open Jupyter now. I'll add a link to this notebook in the video description. This is a simple Keras notebook. I want to focus on the few things you need to update in your notebook to run it locally. The first thing is to set the region. When you run on a notebook instance or SageMaker Studio, you have a default region. Here, you have two options: use the usual SageMaker session, which will use the AWS region configured with the AWS CLI, or pass a region name to the session if you want to use a different region. This saves you from changing your CLI configuration. The second thing is to handle the role. When working in Studio or on a notebook instance, you call the `getExecutionRole` API to get the IAM role associated with the notebook instance or Studio. However, your local machine doesn't have an IAM role, so calling this API will fail. The solution is to pass the ARN of your IAM role, the one you use with SageMaker. You can find it by running `aws iam list-roles` and looking for `SageMakerExecutionRole`. If you use the SageMaker wizards, it's likely called `AmazonSageMakerExecutionRole`. Pass the full ARN. Let's run this cell; it should be okay. The next cell is about fetching the dataset and saving the training and validation sets locally. I'm using the Fashion MNIST dataset. If you want to work locally, you need to save your data on the local machine. Define the input path for both the training and validation sets. If your data is in S3, you can use S3 URIs, but if you want to work strictly locally, use a file path. Next, define the location where you'll save the trained model. You can pass an S3 URI, but here I'm using a local path. The third thing is to point to the local dataset and a local location to save the model. Now we're ready to train. We'll use the SageMaker estimator for TensorFlow, passing the script, etc. The only thing to take care of is the instance type. Since we want to train locally, not on a fully managed instance, you can pass the instance type `local` for CPU training or `local_gpu` if you have an NVIDIA GPU. This is all we need to do. Call `fit` to start training, passing the location of the training and validation sets. Remember, these are local, so ensure Docker is running. Under the hood, SageMaker will pull the TensorFlow container to your local machine. Training will start immediately. Training starts, and this is business as usual. We're using script mode, so the script is invoked, passing the epochs hyperparameter and the local location to save the model. Training takes a minute or two, so I'll pause the video and wait for it to complete. Training is now complete, and I can find my model in `/tmp/model`. I have the training artifact, and if I extract it, it's saved in TensorFlow serving format, the standard format for TensorFlow models. To summarize, you need Conda, a set region, an IAM role, local data, a local instance type, and Docker. As you can see, it's very easy to train strictly on your local machine. You save money by not using Notebook Instances or Studio, and you can use your favorite tools while still calling AWS APIs. This works with all open-source containers, including TensorFlow, MXNet, PyTorch, scikit-learn, XGBoost, and your own containers. The only case it won't work is if you're using built-in SageMaker algorithms, as those containers are not open source. That's it. I hope this was useful. Feel free to ask questions, and I'll see you soon with another video. Bye-bye.

Train with Amazon SageMaker on your local machine

Transcript

Tags

About the Author