Amazon SageMaker Studio AutoML with Amazon SageMaker AutoPilot part 3

December 06, 2019
In this video, the feature engineering step is complete, and we see the hyperparameter optimization step kicking in! ⭐️⭐️⭐️ Don't forget to subscribe and to enable notifications ⭐️⭐️⭐️ Blog posts: * https://aws.amazon.com/blogs/aws/amazon-sagemaker-studio-the-first-fully-integrated-development-environment-for-machine-learning/ * https://aws.amazon.com/blogs/aws/amazon-sagemaker-autopilot-fully-managed-automatic-machine-learning/ Follow me on: - Medium: https://medium.com/@julsimon - Twitter: https://twitter.com/julsimon

Transcript

Okay, so feature engineering is complete, meaning we transformed the original dataset according to the candidate seen in the generated notebook. Now, based on the 10 candidates and the 10 transformed datasets, candidates are the combination of a pre-processing script and an algorithm, so mostly here we have XGBoost and linear learner. Based on this, SageMaker Autopilot is firing up a whole bunch of hyperparameter optimization jobs as described in the generated notebook. It's going to try to find the optimal hyperparameter combinations. This is multi-algo HPO, so it's optimizing for both XGBoost and linear learner. As we saw in the notebook, it's going to run about 250 jobs, parallelizing them so that they run efficiently. These jobs are pretty short; most take about a minute, but it's still going to take a while. If 250 job count feels a little scary, don't worry; it's something you can set. Here, we just use the default setting, but you could use a lower number, although I'm not sure I would recommend it. We're running 10 jobs, trying to tune 10 models, so in a way, it's only 25 attempts per model. You don't want to go too low because it decreases the number of opportunities for model tuning. We see the jobs here. If we look at this experiment view, we can also see the individual jobs. The cool thing about this is that it refreshes in real time, so you can see what's going on, the completed jobs, and the running jobs. Of course, we could deploy models, but it's a little early for that; we want to wait for the models to be completed. We'll explore those different jobs later on. There's quite a bit going on. Here's one; I don't know if it's complete or not, but you can zoom in on those tuning jobs and see the actual steps that happened, including metrics. Full visibility on what's going on. It keeps going, so let's watch this. It's actually fun to watch all those jobs running, especially since these are pretty short jobs due to the small dataset. The cool thing about this, if you've experimented with hyperparameter tuning on SageMaker, is that here we pay no attention to infrastructure requirements. SageMaker Autopilot takes care of this completely. We don't pick instance types or instance counts; SageMaker Autopilot uses heuristics to size the infrastructure appropriately. Sure, if you go into the notebook, you see additional details and can tweak some more, but if you're happy to let Autopilot make decisions, then fine. Just fire up those jobs and wait for the best one. It keeps going, so it's going to keep running for a bit because we're running 250 jobs. Let's wait a little more and let's wait for tuning to be completed, and then we'll see what's what.

Tags

FeatureEngineeringHyperparameterOptimizationSageMakerAutopilotXGBoostLinearLearner