Hi, this is Julien from Arcee. In this video, I'm going to show you how to invoke a SageMaker endpoint from a web service. As you probably know, when we train and deploy SageMaker models, SageMaker creates an endpoint, and we can see this in the SageMaker console. An endpoint is basically an HTTP API that applications can invoke, can HTTP post to get predictions. So here, I trained an image classification model and deployed it, and I can see its URL here in the console. But chances are you don't want to invoke this directly because maybe you need to do pre-processing on the data that is passed to the model. Maybe you need to normalize data, add extra data, or pull some data from a cache or some kind of back-end because you want to throw that into the mix. Similarly, maybe you want to do post-processing on this prediction. For example, this image classification model has 256 classes, right? It's been trained on the Caltech 256 data set, and that's one of the sample notebooks you'll find in the collection of SageMaker notebooks on GitHub. Caltech 256 has 256 categories, which means if we predict an image through this endpoint, we will get 256 probabilities sent back to us. That's not what we want. Maybe we want the top three or the top five probabilities for the top three or the top five classes, but we don't really care about those very low probabilities. There are plenty of reasons why you need to have something standing between the SageMaker endpoint and the apps that invoke it. Security, authentication, and throttling would also be interesting things to consider.
So let's see how we can do this. All you would have to do to replicate this is run this notebook all the way and deploy the model. We need to build a web service, and I'm going to use an open-source project by AWS called Chalice. Chalice is a really cool project. It lets you deploy serverless Python-based web services, and if you're a Python developer who has written code using the Flask framework, you will be right at home because Chalice is really Flask for serverless. Let's look at this code right now. This is on GitHub, and I will post the link in the description for the video.
Let's import some stuff. Debugging is important, especially for me, but you might want to switch this off for production. As I said, this is extremely similar to Flask. We're going to define a route, an API, which will be the /API, and we can post to this API. This will be the only API in my web service. What this will do is make sure we have some data in the body of the request. The data needs to be a base64 encoded image. Remember, we're trying to invoke an image classification model here, so no image, no prediction. Then we need to know which endpoint to invoke, and it's convenient to store this in an environment variable. This will be an environment variable for the Lambda function that Chalice is going to create to host this code. We can easily define the endpoint name in an environment variable, and that's convenient because if we need to update it, it's very easy to update an environment variable for a Lambda function. There's an API for that, and we wouldn't have to redeploy the service completely.
So, we're going to base64 decode the image, read the environment variable for the endpoint name, and there's an extra variable in the body. It's an optional variable, the number of top categories that we want to get back from this web service. If we don't provide it, we'll get the maximum, which is 257, namely 256 classes plus an extra class, which is just a catch-all class in Caltech. But this is probably not what we want. So we'll add an extra variable in the request body called top_case, and this is the number of categories that we will return. Then I need to grab a Boto3 client for SageMaker. I'm going to invoke the endpoint using the name that I read from the environment variable and the image that I extracted from the body. Then I'm going to extract those probabilities. At this point, I will have 257 probabilities, and it is actually a string, a byte array at this point. That's not convenient because I want a Python array, which I can sort and filter. I do not want to see that full vector of probabilities; I just want to see the top count.
Fortunately, there is a way to evaluate a byte array into a Python expression using the `literal_eval` function in the AST module. It will take my byte array, which looks exactly like a Python array in text form, and turn it into a proper Python array. I'm turning it into a NumPy array because NumPy has exactly the functions I want to find the top K probabilities. First, I'm going to build a list of category indexes in ascending order of probabilities. Then I can just take the top K in descending order and extract the category numbers from that to build an answer where I will have the category numbers and the probability for that category, in descending order. The top one will be the most likely, and then I will have a few more as specified by top K. This is my answer.
In real life, you would want to do much more. You would need to do data normalization, pull some data from a cache, maybe user context matched against the key, and inject that into the prediction request that you send here. Afterwards, you may need to do some post-processing, formatting, and filtering, but that's the general outline. We're going to deploy this using Chalice. Chalice can build the IAM policy for the Lambda function just by looking at the Boto3 calls that are done in the code, which is very convenient. If you want to specify it yourself, there is a configuration directory with a config file. It's a very small file, so here I said, "Please don't generate the policy for me; I'm going to give you one," and I'm just defining that environment variable which names the endpoint that I want to invoke. My policy is quite simple. I need permission to create a log for the Lambda function and permission to invoke a SageMaker endpoint. That's it. Charlize could completely generate this, so we could do away with this custom policy, but I wanted to show you how you could build your custom policy if you wanted to. In development mode, it's cool if Chalice generates everything, but for production, you want to tighten things and be much more restrictive.
Let's deploy this. We'll just call `chalice deploy`. So, creating the package, creating the role for the function, and creating the Lambda function and the API. We should see our function. There we go. We have the API and everything. How much time did that take? 15 seconds. I told you Chalice is awesome. We have the URL here, and we could also ask it just like this. We have a Lambda function, and let's quickly check the predictor. There you go. We can see the API Gateway is the event source for this, and the predictor is allowed to create a log in CloudWatch Logs, which is nice if we need to debug. It's also able to call SageMaker, which is cool because this is exactly the whole purpose. We see the configuration variable here.
All right, so now all we have to do is invoke this thing. Here's a small example. I'm grabbing the URL for Chalice, and as you can see, we could also deploy the function locally, which is very nice when you are testing. Chalice supports local deployment and local invocation. I'm going to grab a picture of a floppy disk. Some of you might not know what this is, but those of you who are old enough know what a floppy disk is. This is one of the images from Caltech, and I'm going to build an HTTP POST request with a body containing the base64 encoded data and top_case set to three. So I should see the top three categories for this image. I post this to the URL. Here I'm posting to this API, and I see three probabilities in descending order, which is exactly what I wanted. As we can see, the top category here is category 75 with a probability of a little over 85%. We can quickly check that category 75 is actually a floppy disk. Let's just go to the data set. 75 is a floppy disk. So there you go; it works.
Let's try another image real quick. We have a hamburger. So let's just change this to hamburger. Hamburger, yes sir. Let's invoke this again. Now we have category 95 with a high probability of 92%. If I go back here, 95 is indeed the hamburger. So this works. There you go, very simple code, just a few lines, and it's very easy to build a public API that is receiving traffic and pre-processing and post-processing traffic for your SageMaker endpoint. Chalice makes it so easy to deploy that it would be a crime not to use it, unless you really did not want to use Python, but I would ask you why in that case.
All right, enough cloning around. I hope you like this. I sure do. Let's delete our API. That was quick. And again, I will post the GitHub link in the video description. That's it for this time. Talk to you next time. Bye-bye.
Tags
SageMakerChaliceAWS LambdaImage ClassificationServerless Web Services