Hey, good morning, everyone, or good afternoon, or maybe good evening. Welcome to this new episode of Sage Makeup Fridays. This is actually the last episode for a little while, so it's the mid-season finale. But no cliffhanger, no one dies that I'm aware of, so don't be scared. And I guess you already know, but my name is Julien and I'm a dev advocate focusing on AI and machine learning. As always, please meet my co-presenter. Hi, everyone, my name is Ségolène and I'm a senior data scientist working with the AWS Machine Learning Solution Lab. My role is to help customers get their ML project on the right track.
Thank you for joining us. So once again, we're live, 100% discussion and demo, no slides, except of course the final slide with resources, but that's about it. If you've got questions, please ask all your questions in the chat, and we'll do our best to answer as many as we can. Don't be shy.
So today is a special episode, not only because it's the mid-season finale, but also because today I'm having my revenge. For a number of episodes, Ségolène has been the machine learning expert, and I've been asking lots of silly questions, and she was very kind and answered. But today is a more technical episode, right? And Ségolène told me, "Oh, wait a minute, there's so much I don't understand here." So now you feel what it's like to be me. So I'm going to ask the silly questions. Yeah, well, you can ask all the silly questions, and I'll try to answer as well as you usually do. Okay.
So actually this week, we're going to focus on something I don't think we've ever discussed either this season or the previous season, and it is importing and exporting models. So far, we've been training and deploying all the way, so to speak, doing the full lifecycle on SageMaker, but sometimes it's not what we need. So tell us a little more.
So what we see is that SageMaker covers a complete ML cycle: preparing data, experimenting, modeling, training, optimizing, deploying, and monitoring. Yes, and so many additional things. Exactly. But sometimes we want to be free and don't have to go end-to-end on SageMaker. The good news, Julien, is that we can cherry-pick the feature that makes the most sense for your ML project right now. So flexibility and modularity are super important. So give us some examples.
For instance, you don't have to train and deploy on SageMaker. That's big news. Yes, because if you listen to us for a while, it's like, "Oh, build, train, deploy," and you're all tired of that. So you can do one or the other. Okay, so why would we train on SageMaker and not deploy on SageMaker?
Because, for instance, you can have some business constraints. You are not allowed to deploy in the cloud. Okay. So you can try. Some use cases are strictly on-premise. Exactly. That works. You have a company-wide technology choice. For instance, your deployment is standardized on Docker. Or you've got an already existing CI-CD workflow. Okay. All right. So deploying on SageMaker is not your technical option. Exactly. Otherwise, another reason for not deploying on SageMaker is you have some strong compliance requirements and need to build and deploy on your own IMI. Okay, so managed endpoints are not an option. These are good reasons, and I'm sure there are more, but these are the ones we usually hear from customers. And that's totally okay. That's fine.
So now, what about the other side? Why would we not train on SageMaker? Because it's so easy, so scalable. Why would you not do that and still deploy on SageMaker? So right now, we don't train, but we deploy. Yes. Why not training? Again, business constraints. You are not allowed to store your data in the cloud. Okay, that could be, yes. And it's totally up for discussion, but for some organizations, the current status quo is, "Nope, don't do it." Fair enough. Another company-wide technology choice, you need to amortize your on-prem infrastructure. Ah, you spent lots of money on servers, and the CFO doesn't understand why you need a cloud. Ah, okay, it brings back memories. Yeah, yeah, okay. So it's not a really good reason, but CFOs usually win those discussions. Exactly. Okay, so you have to adapt. All right, I get it. Or finally, you have no training needs. Your current models just work fine. Yeah, it's actually a good one. So you train your models or use existing models from maybe partners. You just don't train and want to deploy in the cloud. So just deploy, no training required. And there's another one that you could hear from time to time, which is, "I don't know how to train or I don't know how to deploy." Come on, my friends, it's not a value; it's not a good reason at all. You can go and watch all the previous episodes, so no, "I don't know how to do it." Nope, sorry, I'll be firm on that one; I won't accept it. Okay, sorry.
So again, when we do the whole end-to-end workflow, you know, we use the estimator to train, we use the predictor to deploy, and we don't even see what the model looks like, right? We know it's in S3, okay? So it's copied in S3 once training is complete, and we assume this is where the predictor takes it and deploys it, but we've never really looked at what the model is. And it's important because if you want to import, you need to know what the artifact should be, right? What do you put in S3 to deploy? And if you want to export, well, it's the other side of the story, what should you look for in S3 and what do you do with it in order to use it in a different context, okay? So, a SageMaker model. Can we show? Yes, we can take a look. Let's show my screen. Let's switch to SageMaker Studio. And let's take any job here. And we can just literally take... Why don't we take... Why don't we take... Or we can take... I want to show XGBoost maybe. If I can find one. It's hidden somewhere. All right. No worries. We can take, let's take a TensorFlow model. So you can just take any model and right-click on this thing. Oh, yeah, yeah, yeah. You're right. It's here. Yeah, sorry, sorry. It's Friday. I'm not getting younger. Okay. Sorry. See why I need Ségolène. She keeps me on track. Okay. So right-click on Open in trial details. Okay, and this opens a new window with all the information you need to know. And of course, there's this thing called artifacts. And here we can see the model artifacts, right? And the first thing we see is that we have a long S3 URI, but we have a file or an object, I should say, called model.tar.gz.
Why don't we take one of those? Do I have a terminal open here? Yes. Okay, so why don't we just copy one of those things? And what's in there? What did you expect? All right, it's a file called xgb.model. And obviously, if we look at a TensorFlow model, and we'll look at TensorFlow and PyTorch and a few more things today, we'll see different files inside the artifact, but this is the model that was saved by your training job. Okay. So remember inside your training script, you save the model in a well-known place, which is usually /opt/ml/model inside the container. And then SageMaker takes that and builds that model.tar.gz file and copies it to S3. All of this is automatic. But this is really the model that's saved by your script. So in this case, it's simply a pickled XGBoost model. What does it mean, pickled? Yeah. Oh yeah, you're asking the silly questions. Okay, great. So pickle is a serialization format for Python. So it's basically, as we'll see, we'll load one in a few minutes. So we can just load the serialized objects straight into our XGBoost code and use it right away. So it's about the same thing. For different algos, for PyTorch, TensorFlow, MXNet, and so on, what's inside the artifact is going to be different. But the process is always the same. So that's really all you need to know. Go and find the model artifact at the output location, copy it, extract it, and again, depending on the framework, this is where you find the model. And then, of course, you should know what to do with it. Okay, so simple as that. These are SageMaker models. So I can close that. What about TensorFlow models?
Yes, so TensorFlow models trained with the TensorFlow container on SageMaker are saved in TensorFlow Serving format. TensorFlow Serving is the model server for TensorFlow, so it's a different format, but we'll see an example. We'll load one on my laptop using Docker. Oh yeah, yeah, I told you it's a technical episode. Not a lot of machine learning today, but lots of cool stuff still, I hope. Docker is very nice. Okay. So should we start with importing models? Yeah. So again, importing models means we have an existing model. We could also call it "bring your own." Maybe it's a model you trained on your laptop because it's a model you already have. It's a model you downloaded from GitHub, right? It could come from a model zoo. That's another scenario. So, long story short, you have a model and you want to deploy it on SageMaker. Okay. So, let's look at an example. We'll start with XGBoost.
So here, I'm basically training an XGBoost model on the direct marketing data set, which we use quite a few times, and we train this locally. Even though I'm training here in the notebook, I'm not using managed infrastructure. Okay, loading the data set, doing some silly feature engineering, and then training. This is really where this happens, right? Vanilla XGBoost code to train this thing, okay? Training in place. So this code you could absolutely run on your laptop just like that, same code, okay? So the first thing I think you need to check or note is which XGBoost version are we using here because when you train and deploy on SageMaker, we automatically select the version. If you say, "I want to train on XGBoost 1.2," then of course when we deploy and use that same version, the built-in containers take care of that. But if you take a pre-trained model and deploy it, then you need to make sure you deploy with the same versions because all those libraries move very fast, and the format of the model can actually change. There was a very big change after XGBoost 0.9 where the format of the models completely changed, and it's not guaranteed you can load them with more recent versions. So it's important to understand what version you used for training. Okay. So here I'm using 1.1, which won't be a problem at all. Okay. So it trains, and voila. Okay. So what we have now is we save the model locally, right? So I can see the XGBoost model. Let me get to the right spot. Where is this thing? Oh yeah, sorry, it's in here. Bring your own XGBoost. Yes. Okay. All right. Okay, so this is the model, and fine, right? It's what you would expect.
Now what we need to do is to package the model for SageMaker. And it's not complicated because now we know what to do. So first, we need to create that model artifact. Just build a model.tar.gz file with that pickled object, that's it. Then, of course, you need to upload it to S3 because this is where SageMaker expects it, and then you deploy. Right, so here we're using an object that I don't think we've used before, which is the model object in the SageMaker SDK. We've used estimators to train, and then the return value from training is a predictor, and we deploy. So here we don't have the estimator, right? So what we need to do is create this XGBoost model with the location of the model in S3, which version we're using. So this is where you need to know to avoid problems, just try to have the same version. And we need a short script, okay? Let me show you the script, xgbscript.py, because we need to provide a simple model loading function. Basically, creating a booster object and then loading the model from the artifact that we stored in S3. This is going to work 99.9% of the time, so you can keep using this function for XGBoost. I think there are very few cases where this wouldn't work. And again, you know, when you use the built-in estimators and built-in predictor, this is kind of built-in, right? But here we need to provide it. So XGBoost model is actually that predictor object. So this is really equivalent to training, right? And you can see the next cell is the same. We call .deploy exactly as we've done before. So this is really equivalent to, "Hey, I trained a model on SageMaker and I've got an object that I can deploy." Okay, so we deploy it and then we can predict with it with predict just like we've done in the past. Same logic. The only difference is you have your local model, put it in a model artifact, put that artifact in S3, create that model object, and then you can call it. Again, nothing fancy. I understand.
Yeah. Okay. Let's look at TensorFlow, right? Because you want to know about TensorFlow. So same process. We're going to train a local model. Okay. So here I'm reusing my Fashion MNIST image classifier. It's a good toy example. And we train locally, locally meaning because you could say it's in Studio, it's not local, yeah, fair point. Locally means I am not using managed infrastructure on SageMaker, okay, so I'm training inside the notebook, so to speak. All right, so once again, we need to be careful with terms of versions because some APIs could change, and you could have breaking changes from one version to the next. So try to stay at least with the same major.minor version. The final digit is usually not a problem. We load the data set, do some preparation on the data set itself, instantiate our model, compile, train, save. And of course, you will all the notebooks. I'm already hearing you scream, "Can I get that?" Yes, yes, yes. You know you're going to get all of it. This is really the critical step here. There are different formats to save TensorFlow models. And I guess you could save maybe a pickled object, but that's not what you want to do here. Here you really want to save them in TensorFlow Serving format.
If you're not familiar, TensorFlow Serving is a model server that's part of the TensorFlow project. Why is this important? It's important because that's what the containers, the SageMaker containers, use to deploy. So inside the inference container for TensorFlow, we have TensorFlow Serving looking for one of those models. So we need to do that properly. So you need to save it in the right format. Okay, so train, train, train. Okay, and now we have this folder here. Yeah, I call it one because usually it's version one. You could have multiple versions of a model inside an artifact, and you could ask TensorFlow Serving to deploy one or the other. It's how we do it. And this is what you should see, right? And you can see here the format is a little more complicated. You could have... these are the actual parameters. You could have multiple checkpoints if you use checkpointing, etc. It's not super complicated, but yeah, and you can see the model is saving in protobuf format, which is another serialization format from Google, widely used. So that's what it should look like. Right, so if you save the model and it doesn't look like this, you didn't use the right format, and don't go any further because you won't be able to. So that's what we should have now. What do we do with this? Well, we kind of do the same thing. We're going to build a model.tar.gz file. The main mistake that I've done again and again and again is not building the tar file from that root. So if your tar file contains just variables and saved_model.pb and assets, you won't be able to deploy. Okay, so that model version, model name, whatever name you want to give to that, needs to be at the root of the tar file. Okay, yeah, very frustrating because you do everything right, you miss this one step, and you deploy, and it fails. It's like, "Come on, the model is in there. Why doesn't TensorFlow?" Sorry, fine. Make sure... Yeah, yeah, yeah, yeah. It drove me nuts. I'm a slow learner, and it took me a while to figure this out earlier. So now we upload that to S3, and of course, we create a TensorFlow model from that artifact. Careful with the versions, okay, once again, and we just call deploy. What I like about this is we don't need to have an inference script because TensorFlow Serving will just load the model, and the purpose of a model server is to include all the inference logic. So you don't need to do that unless you're using exotic serialization. If you use something different from, I don't know, NumPy or JSON, the usual suspects, if you have weird serialization, deserialization that TensorFlow Serving cannot manage natively, then you would provide an inference script with the input_fn function and the output_fn function, and it's pretty well documented in our SDK. But most cases, you can just get away with this, which is fine. So we deploy and then, as usual, we predict. Right, and yeah, let's predict. We should still be up. Yes. This is funny. I could do that. Oh, it's wrong. We got a misprediction. So you can see it's not fake. We don't fake anything ever. All right. So TensorFlow. So you can do this. Yeah. All right. I can do this. You can do this. Everybody can do this. Okay. Let's move on and look at a slightly more complicated example. We're going to look at PyTorch. And I didn't say that PyTorch is generally more complicated. We're actually going to reuse our Hugging Face example. When was that? Was it episode five? Five. Okay, you got a good memory. Yeah, so episode five this season, we showed you how to fine-tune Hugging Face models for NLP on SageMaker. Remember that we didn't show you how to deploy them because, and friendly colleagues, just go away for a second, because we still do not support deployment on Hugging Face and SageMaker. We're working on it, but it's not there. So, you know me. I was frustrated. I easily get frustrated. And I decided there's got to be another way. So this is it, right? You did it. Well, I did it. And I have to say I found a super, super useful blog post on the machine learning blog that actually includes a CloudFormation template that deploys a Hugging Face model using the PyTorch container, which is what we're going to look at. So you can easily find it. It's been written by our colleague Todd Escalona. I hope I got your name right. Todd, thank you very much. Super good post. I was missing a single line, and reading your post helped me figure it out. So thank you. Credits. Thanks, Chris, to Todd for getting it right and helping me figure it out.
Okay, so here what do we do? This time we actually train this sentiment analysis model. It's a Hugging Face model. So here I'm actually training on, but let's ignore that. Let's completely ignore all this. This never happened. You didn't see it. It's not there. Trust me. It's fine. And okay, so I actually have, which is really a PyTorch model, right? I'm using Hugging Face as an excuse, but this would work for any PyTorch model. I've got my model artifact in there. Do you want to take a look at it just to see what PyTorch looks like? No. Why would you mean no? No is not an option. Let's take a look at it. Everybody loves PyTorch. It's a big one. How much? 230 megs something. So this is what you have in your PyTorch artifact: the model, the configuration, and training arguments. So again, if you had trained this on your local machine, your local server, you would get those three files and you would package them in a model artifact and you would upload all that stuff to us. So here I'm taking a shortcut because we already have that. Okay, so let's pretend I built it and I put it in S3. Now what? Okay. So here I'm going once again to use PyTorch model to import, so to speak, the model to SageMaker, framework version, blah blah blah, and I need an inference script. And here it's pretty, it's actually using Hugging Face, I think is a good way to understand why we need this. Because yes, the Hugging Face model, this one is implemented in PyTorch. But the PyTorch inference container, which is using TorchServe, right, the vanilla one, doesn't know anything about Hugging Face. Okay. So that's why we need to provide that script where we actually load using the Hugging Face APIs. And yeah, then it's a legit PyTorch model, but we're actually using the Hugging Face API. So that makes sense, right? That makes sense why we need that script. So let's look at the script. It's not terribly complicated. So we have that model_fn function, which is responsible for actually loading the model. So here, loading a Hugging Face model is very simple. So from the artifact, we have the config file, and we have the model itself. So we load the config using the config and here I train a DistilBert model specialized for sequence classification and I just load the model and its configuration, right? And I return that. And now this thing, this model is really a PyTorch model. Okay. And then fine. Then it's all good. Then the container knows how to invoke it, right? But you see we're using transformer APIs, and of course, the vanilla container wouldn't know how to do that, right? So when we have a proper Hugging Face container for deployment, I'm guessing we won't need to do that because the logic will be in there. Maybe we will have to, well, let's see. I'm not making any promises, but I'm guessing it'll be simpler. And then we need a prediction function, okay, so taking the input data, which is a text sentence. This is a sentiment analysis model, so we pass text customer reviews and we predict them just like that. We take the output, not probabilities because these are really activation values. We didn't apply softmax. We could apply softmax and return probabilities here. We keep it simple. We just look for the top activation value, which indicates the top class, and we return the name of that class, which is negative. Okay. And then those input and output functions, I'm guessing, I'm not sure, but I think there are actually options here because I think JSON would be supported natively by TorchServe. So not sure, need to check. But again, if you have different serialization, you could be clever here. I don't want to be clever. I'm fine with JSON for once. So again, load the model, predict. You can see these are super simple, easy to adapt. Finally, once we've done that, we can deploy as usual and then we can predict. We can see this is a positive review and it's positive, and this is a negative review and it's negative. Perfect. That's PyTorch. Same logic, just a little bit of customization here because again we're using Hugging Face, but this shows you it's not terribly complicated to do that. Right, you just need to have a working example as always, it's so much easier to adapt. Okay, and don't worry, don't forget to go and look at Todd's post on serving PyTorch models with TorchServe. It's really, really useful.
Okay, what did I forget? I think that's about it. Okay, so we looked at importing, XGBoost, TensorFlow, and PyTorch. So the main ones, the ones that I didn't cover, like scikit-learn, you can absolutely use the same technique as XGBoost. And MXNet is very, very similar to this, right? Build the artifact and use MXNet model and then call deploy. And again, in most cases, I don't think you will ever need an inference script unless you have weird serialization. So you should be covered here. Let's move to exporting. Maybe, yes. Yeah, let's do exporting. Okay. What does it mean exporting? So exporting in this case means we train on SageMaker, but whatever the reason is, we do not want to deploy on SageMaker. Okay. So now we know what to do. We need to grab the artifact in S3 and export it. Extract it. Copy it to wherever. Extract the model inside and then use whatever we like to load the model and predict it. Okay. So we're going to look at a few examples. Here is the first one, which is, I think, XGBoost. Okay, so, alright. Training an XGBoost model, as we've done many times. This is still the same direct marketing data set, binary classification. SageMaker business as usual. Thank you. Yes, it's a good example. So we train, nothing particularly different here. Here we train and train and train and let's not look at the log here. Okay, so we call fit, and as we've seen, we can easily retrieve the artifact. You don't even need to go and look at anything; you can do this programmatically. There's a member in the estimator called model_data that actually points to that exact location. So you can automate all of it. And so I can copy this and I can extract it. And we've seen before that this contains that XGBoost pickled object. So now loading it is super simple. Once again, the only gotcha, I guess, is version problems again, API differences across versions, etc. So make sure here we train with XGBoost 1.1, so let's load with it 1.1. Although I guess you know if we tried 1.2 or maybe 1.3, you know it would certainly work, but again, it's hard to know in advance. Just use your own version. And if you run into problems, then stick to the same version that you use for training. And it is as easy as this, right? Create an empty booster object and then just load the pickled object. That's it, right? And here, for example, I'm dumping that object to a text file, and this is literally my XGBoost set of trees, right? So then you can go and you can predict with it directly. It's there. Okay, so you know, I get a lot of questions on that. I mean, it's literally all the time people ask me, "I want to use that model, I want to import that model, I want to explore that model," and then they get all anxious, and it's not hard, right? Again, if you have working examples to start, it's so much easier, which is why we're doing this today. But yeah, you can see it's easy to go and do those things. So let's do the same with TensorFlow. So Fashion MNIST again, training, retrieving the model artifact.
Of course, we could load it here, but let's do it differently. Let's show you something else. On my local machine, and this is really my local machine this time. What did I do? I loaded and extracted this artifact. So this is really... same thing, copy, extract it to this directory. And we're going to try and... so let me copy that command. We're going to start to... we're going to try to load the model locally with TensorFlow Serving in a Docker container. And if this looks company, I actually copied and pasted it from the TensorFlow Serving doc and I just used my own path. So it's not hard. Yeah, so this is the path. This is the model we want to serve. And that's it, right? And of course, you need to pull the TensorFlow Serving image first. So I've done that before. Run this. Yes. Success. I love it. So that's it. It's loaded. And then, you know, TensorFlow Serving is exposing its API on my local machine, and I can use it as usual. Okay, so nothing complicated here, right? Nothing complicated here. Just need to know what to look for. Okay? All right. See? Yeah. Like I say all the time, I waste a lot of my own time so that you don't have to. Right? And if you think this worked on the first attempt, it didn't. It never does. That's okay. I don't care. I just hope it's going to work on the first attempt for you. That's the important thing. Again, I'm paid to waste my time so that you don't have to. And I'm so good at wasting my time. I'm really about it. Okay? So here's another option. And of course, if you wanted, here I ran this container locally, but if you wanted to deploy on ECS or EKS or any Docker platform, you could do the same. Okay, just recommend it. You could run this on your cluster, and off it goes. Actually, we're going to do an example later on. I'm going to look at deploying on a Docker cluster. Okay, let's move on.
So what about built-in algorithms? Yeah, which we've covered built-in algos, yeah, yeah, in a lot of detail, BlazingText, and I guess we did image classification, yeah, we did, yeah, yeah, yeah, last season we did a lot of those NTM, LDA. Yeah, we did a lot. Okay, so why are those different? Those are different because, as you may know, they are implemented with Apache MXNet. And so we need a slightly different logic to work here. So let's see how we could export the image classification model. So model trained with the image classification built-in algo. And the reason I picked this one is because this is really a popular algo. I mean, I know lots of customers are using it because it's such an easy way to build computer vision models. Exactly. And object detection and segmentation are the same. These are really complex models to work with. And I think the built-ins for computer vision are really, really simplifying the work for developers. So here I'm going to show you image classification, but it's working the same for all the other built-ins. And for the record, the only one that's not an MXNet implementation is the built-in XGBoost. And BlazingText is a little bit different because BlazingText actually saves models that are compatible with fastText, the Facebook algo. But all the other ones like Linear Learner, KNN, LDA, etc., etc., are MXNet. Okay, so what I've done here is I've run this example from the SageMaker example repositories. So it's a transfer learning example on the Caltech data set. We may have shown this one before, I don't remember. But yeah, go and run this. So just run this notebook as is and just write down the location of the artifact. Again, model.tar.gz, nothing special. And now you know the story. We copy it, we extract it, and we see those things. A little bit different. So we see this .params file, which, as you would expect, is the model itself, the parameters that were learned. This number is not random; it's the number of epochs this ran for. So here it's a fine-tuning, a transfer learning example that runs for two epochs, and so this is why we have two. Okay, good to know as well. Yes, I think so. Then you have a JSON file, which I guess we can open that. Not like that. Just open with editor. Yes. Who thought it would be nice to look at JSON? All right. So we can see... This is not for humans, don't worry. Okay? We can see basically the layers and all the functions and all the operators. Convolution, so it's literally the model definition. I'll show you in a minute a better way to visualize it. Let's close those using files. And model shapes are important because, guess what? This is the input shape of the model. So the input tensor for this model is called data. Pretty good name. And it has a shape of, so that's batch size. This was trained with 128 batches, I'm guessing. Number of channels, three, because this is a computer vision model that works on color images, red, green, and blue. And images are 224 by 224. This is really useful because knowing this is important if you want to invoke the model. Take a look at this. This tells you what you need to know to invoke the model correctly.
This extract the model, and this is the bit that's a little less familiar because you know maybe they're a little less familiar with MXNet in general, so don't worry. There's nothing really weird here, and if you don't understand the details, that's fine. You can just go and read the doc, but I'll try to explain what goes on here. Okay, so first I need to load the model definition as a JSON file. And then I'm using this API in Gluon. Remember Gluon is the imperative API of MXNet. I don't want to be too theoretical here, but MXNet originally was a symbolic framework kind of like early TensorFlow, etc., etc., and then it added an imperative API kind of like PyTorch, and this is called Gluon. So here we can just say basically, "Hey, please instantiate this model." It's really what it does, loading that crazy JSON file that we saw and setting the input tensor as data. This is why you need to know that this is called data. Otherwise, bad things will happen. That's all you need to know. We're instantiating that model based on its definition and setting the name of the input tensor. And this is a really cool one. MXNet has this API called plot_network where, again, passing the model, you can say, "Hey, I want to see it." So that's softmax here. So it's actually the output layer. So you need to scroll all the way down. So that's where it starts. Data, input tensor. I'm guessing this is batch normalization. And then here we go, convolution and batch norm and activation with ReLU and pooling. Right? So if you constantly tonight, you can play the game and watch through all the steps. Is this a skip connection? I think it looks like. It looks like. Yes. And if you don't know what a skip connection is, you haven't been paying attention. Because Ségolène explained all of it a while ago. So we can see this model in all its glory. And by the way, this is a ResNet model. This is why I'm pretty sure these are skip connections. That's what ResNet does. And you can see all the layers. It's actually a cool way to see how models are built. Definitely better than your JSON. Much better than JSON. So if you're curious about certain models, you can load them in Gluon and use that API to plot them and just check that, oh yeah, you understand the architecture. This is pretty hard to fit in PowerPoint slides, unfortunately. It's really... I mean, it's a really nifty feature.
So now we have the model, but it's empty, it's blank. So we need to initialize it with the training parameters. And then we need to call initialize to initialize some of the runtime parameters. That's all you need to know. And it complains a lot about warnings that we can ignore. And then we can predict. I was a little bit lazy here. We could load an actual image. Here I just generated a random tensor. So size one because it's only one image, right? Three channels because it's red, green, blue, and it's 224-224. So it's just a random tensor. And those are in MXNet, those are called ND arrays, which are really tensors. It's very similar to NumPy arrays. And then we just predict. I'm not sure what we're predicting here, but we're predicting. Just call net, get a response, and the shape of that response is a vector of 257 probabilities because the Caltech data set we fine-tuned on has 256 classes plus what they call a clutter class, which is just a bunch of random garbage to make the model learn a little bit harder. Okay, so there you go, not hard, and you can apply that to all the MXNet models, all the actual MXNet models if you train proper MXNet models, that works, and all the built-ins that are based on MXNet, you can do the same. Right, only thing again you need to pay attention to is the shape and the name of the input tensor, but that is stored in that JSON file in the artifact. Right? Okay, one more? That's fine. Yes, we have 10 minutes. 10 minutes before. Perfect.
Okay, so all you container fans out there, I'm going to like this. There you go, take a deep breath, it's going to be great. So, let me show you how to deploy a TensorFlow model on the container service. And I've selected Fargate, which is serverless, even better. That's the easiest way to work with containers, as far as I'm concerned. A lot of people will disagree, but I really like Fargate. That's fine. So high-level process. Start from, in this case, a TensorFlow artifact. Extract it, blah, blah, blah, blah. We've seen this before. Now, what I'm going to do is I'm actually taking the model that we extracted here and I'm committing it to a GitHub repo. Because on the container that's going to run on Fargate, I'm just going to pull on that repo, extract that model, and fire up TensorFlow Serving inside the container to serve it. So you don't have to use a Git repo, but for demos, it's the easiest way. OK, so we'll see how that works. No time to explain ECS and Fargate in detail. There's a command-line tool which is called ECS CLI that lets us do some operations like creating a cluster, listing the tasks that are running on clusters. So yeah, we need to install it. Creating a cluster is as easy as this, right? And it's instant because it's serverless. So you don't have to wait for anything. And if you go to the Fargate console, you're going to see your cluster, right? It's immediately available. Okay, very nice. And I'm going to set this container as my default container to work with. And yeah, I can list, I can use ECS CLI to see if anything's running on it, etc., etc. Right, so simple command line to check what the cluster is doing. My containers are going to log stuff, so I need a log group in CloudWatch logs. You just need to create this once. And I need to register what is called a task definition, which really lists what the task that I'm going to run on the container looks like. And you need to register it once. And let's maybe look at the file itself. I don't really enjoy that JSON viewer, okay, so it looks complicated but it's really not. So this is the image I'm using, right? So it's actually the deep learning container for TensorFlow 2.3.2 for CPU. So it's unchanged, right? I use the existing container. So this is the same container that SageMaker uses actually, right? For CPU inference, same one. And the command that I'm gonna run inside that container is clone the repo, start TensorFlow Serving, loading the model that we cloned. Okay. So that's it. That's what the container does. Fire up TensorFlow Serving, load the model, kind of like I've done locally a few minutes ago. Okay. And of course, I want to serve predictions from there. So I need to map the two TensorFlow Serving ports, 8500, 8501. One is for the REST API and the other one is for gRPC, if I remember correctly. The logs. Okay, so please log all that stuff to AWS logs, blah, blah, blah, whatever I created. And just a couple of network settings, nothing. Okay. Just for reference, I shouldn't say this, but this is actually a demo I first built in 2018. Oh, three of them. For the Stockholm Summit. Hi, Sweden. And... I resurrected it, and believe it or not, the only thing I changed, I promise it's true, is this. I just updated the container image, and it ran on the first try. So stability. API stability for the win, baby. Yes. Thank you, ECS. Right? Big fan. Big fan. Really, literally, I just updated this, and it ran. Pretty cool. Pretty cool.
Okay, so now that we have a cluster, now that we have a task definition, we need to run the task. Okay, so here I'm using the AWS command line, ECS run task, which cluster I'm running it on, which task definition I'm using, how many tasks do I want, where do I want to launch this, Fargate. Right, and yeah, this one is a little unfriendly, but it's basically saying, "Please launch this task in this VPC in this subnet with this security group," because I want this to be publicly accessible. I want my container to have a public IP, so of course, I need to run this inside a public subnet in one of my VPCs and with a security group that, as you would think, opens 8500 and 8501. That's fine. You can use any subnet in your default VPC here. Any public subnet. Yes. Okay. All right. So this runs, blah, blah, blah, blah, blah. Now, okay, let's try and run this. Oh yeah, okay, so I can see this task right is running, and this is the public IP for it. So now if I take that IP and if I build a URL with the TensorFlow Serving format, mm-hmm, right, IP address, the port, v1/models, I should be able to invoke stuff. So this is still the Fashion MNIST model. So I'm loading the Fashion MNIST data set, and I'm using a completely standard URL invocation here. I'm not using SageMaker at all. Right, it's not SageMaker; it's an API running inside a container. So right, no more SageMaker here, and I use the requests library from Python, but you could use anything in a different language, and I post, yeah, I post the data, so a few of your images to the endpoint, right, and let's try that again. What it is, I need to run those two cells because I'm messing around with it. All right. Oh, no errors. It looks fake. I want to see errors. Yes. Okay. One misprediction here. Okay. So I'm probably still going to be doing this demo five years from now because I love it, I have to say. Because it shows that, you know, you can train. This one is a super simple model. But let's say you train, you know, you fine-tune your crazy Hugging Face or crazy PyTorch model on a GPU cluster on SageMaker using managed infrastructure that you would never dream of buying yourself because it's very expensive and you don't need it all the time. So leverage the scalability of SageMaker for training. And then grab the model, put it on your local Docker cluster, even for testing, right? I mean, if you want to, if your Dev and test environment are on-premise, fine, go and do that. That's totally okay. And honestly, you see it's not a lot of work. Get the artifacts, extract it, and then either put it in a container directly or load it like I've done with the Git clone, which is probably not a good way for production, but for testing, it's fine. And then that's it. The task definition is nothing to worry about. You should already have that if you work with container services on AWS, and that's it. So you can see it really fits nicely together. I really like that. And I really like the fact that I don't have to manage the cluster. That's what the cluster looks like. No instances to manage. How cool is that? Fully managed, serverless. I can see the task. I can see it was running. It's all good. I can see all that stuff here that I used in the task definition. I can see the log. Look, amazing. Yeah, perfect. It just works. Cherry on the cake. Yes, cherry on the cake. Where's my t-shirt? All right, all right. Okay, so yeah. One minute. We are absolutely on time. Have we ever been late? I'm not sure. Maybe a couple of times. Okay, so as always, give me a few minutes. I will commit the notebooks to the usual repository. This is where you will find all the SageMaker Friday notebooks for season three, and for those of you who have been too busy to watch previous episodes, I'd like to remind you that you can watch all of them on YouTube and, of course, on the channel that you're looking at right now. Okay, but if you're watching them on YouTube, then I get the views. Yeah. Yeah. Please use YouTube. I like to get the views. And you can ask me questions. Right? Yeah. And you respond. Yes. You can leave a comment. And most of the time, I'm going to respond. Right? If I'm not responding, it's because either I'm too busy or I didn't understand the question or I simply don't. Okay? All right. But yeah. Yeah. YouTube. And for those of you who have posted questions, thank you. I love that. And you can post more. Okay. All right. So I think we're done. Yeah. So I'm quite sure we'll be back at some point. But there are other things we need to work on. And this is it for now. So Ségolène, thank you very much for joining me for those eight crazy episodes. That's great. Yeah, it was good fun. We'll certainly do more in a few months, right, and once we've cleared a few things out of the way. No, I didn't say Re:Invent. We're going to keep us busy for a long time, but we have other projects and we need to post for now. So thank you so much until then. You know, please connect, get in touch, ask questions on LinkedIn, Twitter, YouTube, anywhere you can find me. On the wheel? Not yet, working on it. I'll be on that rock. And until then, keep rocking with machine learning. Bye-bye. Thank you very much. See you soon.
Julien Simon is the Chief Evangelist at Arcee AI
, specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.