Hi. In this video, I'm going to show you how to use AWS Step Functions to automate your SageMaker workflows. Step Functions is an AWS service that lets you define state machines with steps and transitions, including sequential, parallel, and conditional execution. Step Functions is integrated with a wide range of AWS services, and SageMaker is one of them. In this notebook, I'm going to use the data science SDK, which makes it really easy to use Step Functions and SageMaker. So let's get to work.
The first step is to ensure you have the latest SDK. Then we need to add an IAM role to make sure Step Functions is allowed to invoke the SageMaker APIs. There's an example here. You can use that one; just copy and paste the policy and create a service role for Step Functions and use that. For testing, it's okay and more than enough, but it's too permissive. For production, please restrict it as much as possible. Once we have the role, I created it, we can start importing the libraries we need, such as Boto3 and the Step Functions SDK. We're also using an S3 bucket, where we'll store the data for SageMaker, and we're using the default bucket here.
The next step is to grab a dataset, and here we're going to use a toy dataset called the Abalone dataset. This dataset describes a regression problem where we're trying to predict the age of the Abalone shellfish based on eight features representing dimensions and weights of the shellfish. The dataset is in libSVM format with feature index and feature values. Let's grab this dataset and load it using a scikit-learn API. Then I split it into three parts: training, validation, and test using a NumPy API. I'm saving these three files, with 70% for training, 15% for validation, and 15% for test. A simple way to do this.
Next, I need to upload those three files to S3. I'm defining some file names and uploading them to my bucket. Then I'm redefining those names as S3 URIs, which I will pass to the different steps in the workflow.
Next, we configure the SageMaker estimator. We're going to use XGBoost here, and I'm using the latest version. XGBoost is available as a built-in algorithm or as a built-in framework where you can pass your script. I'm using the algorithm here. We're going to train on an M4 instance. Then I set hyperparameters, and the most important one is the objective. We want to build a linear regression model. We're passing some hyperparameters. If you're curious about those, you can go to xgboost.com and read about them.
Now we can start building the workflow. It will look like this: a training step, a saving step where we register the model we trained to SageMaker, a batch transform step where we predict the test dataset, creating an endpoint configuration, and an endpoint to deploy the model to a real-time endpoint for prediction. These steps highlight the basic SageMaker workflow. As we'll see, this Step Functions workflow can be edited, and we could add extra steps, such as invoking a Lambda function before deploying the endpoint to run extra checks on the model. Here, we're keeping it simple.
We need to define the execution input, which are parameters passed to the workflow. I'm passing three strings: the job name, the model name, and the endpoint name, so I have unique names for those three things every time I run the workflow. It's good practice to make these names unique to avoid name clashes.
The first step is the training step, equivalent to calling the fit API with SageMaker. I'm passing the estimator we defined above and the two channels for training and validation. Next, we create the model using the create model API in the SageMaker SDK. It's more accurately a register model API, as it registers the trained model as a SageMaker model using the S3 artifact.
Next, we transform the test set using the transformer object on an M5 large instance, passing the location of the test set in S3. Then we create the endpoint configuration, specifying the model and instance type. Finally, we create the endpoint itself, using the configuration above.
Now we've defined each individual step and need to bring them together and chain them. The order matters: training, model creation, batch transform, endpoint configuration, and endpoint. Using this definition, we can create the workflow itself. We give it a unique name with a timestamp, pass the definition of the steps, and the execution input with the strings for job name, model name, and endpoint name. We can visualize it to ensure everything is in the right order.
We can create the workflow, which registers it to Step Functions, and execute it. The execute API gets everything going, and we pass the inputs with the three unique names for job, model, and endpoint. If we go to the Step Functions console, we can see the workflow running, with information about each step. We can also edit it if we want, using the Step Functions language, which shows JSON for each step and its parameters. The data science SDK makes it easy to write this without writing JSON code, but you can still do that if you prefer.
If we wanted to insert a call to a Lambda function, we could do this easily by selecting a function from our Lambdas and configuring the payload. We can see the equivalent JSON code. This shows that the high-level workflow we build with the data science SDK is a Step Functions workflow, and we can add extra states if needed.
This workflow will run for a few minutes. I have the exact same one that I ran just a few minutes ago, and we can see it has gone through all steps successfully. We can zoom into every transition, look at the steps, and the parameters. We can run this workflow repeatedly once it's debugged, starting a new workflow using the state machine, which will run for as long as needed and complete.
We don't have to use the console; we can use APIs for all of this. We can see the progress, list events, and list executions. We can also list all workflows and generate a CloudFormation template. I'm writing this to a file, and voilà. If you want to use CloudFormation to replicate this, you can automatically do this with the template. For different regions, make sure to adapt the template, such as using the correct region-based XGBoost container. This will replicate the exact same workflow, and you get all the CloudFormation benefits, like defining change sets.
That's pretty much what I wanted to show you. I will put a link to this notebook, which is a modified version of one of the SageMaker examples. I will also put a link to a reInvent session with one of our customers who built a fancy automation workflow using Step Functions and SageMaker. That will inspire you. Well, that's it for today. Hope you liked it. See you soon. Bye-bye.