Amazon SageMaker Studio AutoML with Amazon SageMaker AutoPilot part 1

December 06, 2019
In this video, I show how you to fire up an AutoML job with Amazon SageMaker Autopilot. ⭐️⭐️⭐️ Don't forget to subscribe and to enable notifications ⭐️⭐️⭐️ Notebook: https://gitlab.com/juliensimon/amazon-studio-demos/blob/master/dataset.ipynb Blog posts: * https://aws.amazon.com/blogs/aws/amazon-sagemaker-studio-the-first-fully-integrated-development-environment-for-machine-learning/ * https://aws.amazon.com/blogs/aws/amazon-sagemaker-autopilot-fully-managed-automatic-machine-learning/ Follow me on: * Medium: https://medium.com/@julsimon * Twitter: https://twitter.com/julsimon

Transcript

Hi everybody, this is Julien. I'm still in Las Vegas for AWS reInvent. In this video, I would like to give you a quick tour of SageMaker Studio and show you some of the capabilities we introduced. In particular, how to automatically create models with Amazon SageMaker Autopilot, how to deploy those models, and how to set up monitoring with Amazon SageMaker Model Monitor. I'll try to show you a few things along the way. The Amazon SageMaker Studio is currently available in the US 2 region only, so that's where you'll find it. Just go to the SageMaker console, and there's a getting started link showing you how to create a user for SageMaker Studio. You can create IAM users or SSO users. These are simple steps. Just follow the instructions, and you'll be set up in no time. I've already opened Studio for the sake of time. We find ourselves directly in the IDE, which is based on JupyterLab. This should be familiar. We have our file view, running terminals, running kernels, a Git client, settings, experiments, endpoints (none for now), and open tabs. Some parts of Studio are in preview, as mentioned in the corner. Over time, some features might change, be removed, or new ones added. Hopefully, this will give you a good introduction. You can launch all kinds of things, including different containers. There's a base Python container, MXNet, TensorFlow, and more to come. You can create notebooks, which are familiar if you've used JupyterLab before. Let's use a notebook to get started. First, we want to download the dataset and extract it. Then we can use Pandas to visualize the dataset. If you've worked with SageMaker, you've likely seen this dataset before. It's used in many notebooks. It's a simple dataset with about 42,000 lines and various customer features. The last column, called 'Y', indicates whether a customer has accepted a marketing offer. It's a binary classification problem, and the dataset is quite unbalanced, with many more 'no's than 'yes's. We see numerical and categorical features, which is typical. Instead of training models manually using classification algorithms like XGBoost, let me show you how to use SageMaker Autopilot to automate the process. The only thing I'll do is split the dataset, keeping 5% as a test set for scoring the model later. I'll give 95% of the data to SageMaker Autopilot, which will internally split it into training and validation sets. I'll keep 5% for myself. I see the two files, and I'll upload them to S3. We see the S3 location of the training set we'll pass to Autopilot. Next, I'll go to the experiments tab and click on "Create Experiment." Let's start with a name: "Marketing AutoML Demo." The input data location is the path we just printed in the notebook. The target attribute name is the column we want to predict, which is 'Y'—whether a customer will accept an offer. The output data location is where the model and additional training artifacts will be stored. We can select the type of problem to solve: classification, regression, or let Autopilot figure it out. Let's do the latter. Finally, we decide whether to run a full experiment, which includes training and optimizing a model, or just suggest candidates. For now, let's run a full experiment and click "Create Experiment." The AutoML job will go through four stages: analyzing data, preprocessing, feature engineering, and tuning. Analyzing data involves running statistics and automatically figuring out the problem, which is a binary classification. Preprocessing will look at possible scripts, and feature engineering will apply transformations and identify algorithms. Tuning will use hyperparameter optimization to maximize model accuracy. Some steps take time, so I'll edit out the time spent running those steps and see you in a few minutes to discuss the rest of the process.

Tags

AWS reInventAmazon SageMaker StudioSageMaker AutopilotMachine LearningModel Monitoring