Hey, hi everybody, this is Julien from Arcee. In this video based on Amazon SageMaker Studio, I would like to show you how to easily deploy a machine learning model. I'm going to start from the SageMaker Autopilot experiment that I've run in a previous series of videos, but this would apply to any model that you've trained on SageMaker Studio.
Starting from the experiments window, where you would see all the models that you've trained, I can select my SageMaker Autopilot experiment and see a whole bunch of different models because, as you probably know, SageMaker Autopilot will run hyperparameter optimization and fire up a whole bunch of different models. I'm going to pick the top-performing one. But again, this would work with any model that you've trained. I simply click on Deploy Model. The first thing I need to pass is an endpoint name. Let's call this Marketing AutoML Best Model.
I select an instance type from really tiny things to pretty large ones; you'll find all the SageMaker supported instances. You can even pick the new Inferentia instances, but that's for another video. Let's go with T2 XL. We'll stick to one instance because we're not going to send any serious traffic to this. I can also enable data capture. This is based on a service called SageMaker Model Monitor. Model Monitor will help you capture data sent to your endpoint as well as prediction responses. Let's enable both here. I just need to specify the location and the percentage of traffic that I want to capture, so let's go for 100%. This will capture requests and responses and put them in that location.
Later on, with Model Monitor, I'll be able to train and build a baseline from the training set and compare incoming prediction requests to that baseline. Model Monitor will look for deviations in data quality, such as missing features or features with the wrong type. For example, a feature is supposed to be an integer and now it's a float. It will also detect data drift in statistical properties between incoming traffic and the training set. For now, we need to configure traffic. In advanced settings, we can pass encryption keys and deploy the endpoint in a VPC, but let's not do that. We just click on deploy, and off it goes. This is really equivalent to what you would do using the SageMaker SDK, but as you can see, you can easily do it using Model Monitor here. The endpoint will be visible here. For now, it's creating, so let's wait for a few minutes and I'll see you when the endpoint is ready.
After a few minutes, I see my endpoint is listed as in-service, and I can see its settings here. This was a simple deployment with one single production variant hosting my model. I deployed on T2 XL, so that's all good. Monitoring is not in place yet; we'll do that later. Let's quickly check that this endpoint works. I'll go back to the notebook I used for the SageMaker Autopilot job, and you'll find the link to this notebook in the video description.
Remember what we've done: we downloaded a simple dataset. It's a CSV dataset for marketing information. Each line describes a customer, and the last column called Y says yes or no, indicating whether that customer accepted a marketing offer. This is the model we trained, and I gave 95% of the dataset to SageMaker Autopilot to do its thing. I saved 5% for scoring the dataset with the model. This is the CSV file I have here, and I can predict it. I just need to set the endpoint name, import BOTO3, and grab the SageMaker client. This bit of code will iterate over each line in the test set, read the line, drop the label (removing the last column), and send the modified line without a label to the endpoint as text/csv. It reads the response, and for fun, we're computing different statistics. We're counting true positives (samples labeled as yes and predicted as yes), false negatives (samples labeled as yes but predicted as no), and the other ones (true negatives and false positives). I'm counting all of these. Let's run this cell. It's going to predict about 2,000 samples, if I remember correctly. Sending them one by one, we could batch them up and send multiple samples at a time, which would make our cell faster but also make the code a little more complicated. I just want to show you how simple it is to invoke the endpoint. You need the name, the content, and you just need to pass your CSV line in this case.
We're done, and now we can compute additional metrics. I can print a handmade version of the confusion matrix. On the diagonal, we see true negatives and true positives, and we like those to be max values. We'd like the other diagonal to be zero, but as you can see, we have a few false positives and quite a few false negatives, telling us we need to work on the model some more. We can also compute metrics called accuracy, precision, recall, and F1, which are interesting metrics for classifiers. I won't go into these too much, but you can read about them. They're important metrics for classifiers. So there you go, my endpoint works and it's predicting quite okay. Now, I guess we'd like to look at data capture and model monitoring, so let's do this in a different video.