SageMaker JumpStart deploy Hugging Face models in minutes
October 08, 2023
Experimenting with the latest and greatest models doesn't have to be difficult. With SageMaker JumpStart, you can easily access and experiment with cutting-edge large language models without the hassle of setting up complex infrastructure or writing deployment code. All it takes is a single click. In this particular video, I walk you through the process of deploying and testing the Mistral AI 7B model as an example.
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️
To get started, you simply need to navigate to the SageMaker JumpStart website and locate the Mistral AI 7B model. Once you find it, you can click on the model to select it. This will initiate the setup process, which takes care of all the required infrastructure for you. Once the setup is complete, SageMaker JumpStart provides a sample notebook and you can start testing the model immediately!
If you want to experiment with the latest state-of-the-art models like the Mistral AI 7B model, SageMaker JumpStart provides a hassle-free way to do so. Try it out and explore the possibilities of cutting-edge AI models with just one click!
Amazon SageMaker JumpStart: https://aws.amazon.com/sagemaker/jumpstart/
Follow me on Medium at https://julsimon.medium.com or Substack at https://julsimon.substack.com.
Transcript
Hi everybody, this is Julien from Arcee. Welcome to another hotel series video sponsored by Horrible Hotel Coffee, but it keeps me going. I've been on the road for a few weeks now, meeting with a ton of customers together with AWS. A question that comes up a lot is: what's the simplest way to experiment with the best models? New state-of-the-art models come out almost every week. How difficult is it to just deploy them, evaluate them, and see if they're a good fit for our project? That's a common question, and that's exactly what we're going to look at in this video. I'm going to show you how you can literally deploy all the latest and greatest models on AWS with a single click using SageMaker Jumpstart, which is part of Amazon SageMaker, the AWS machine learning service. It's super simple. Let me show you how to do this, and you'll be experimenting in minutes. Let's get to work.
Our starting point is the homepage of SageMaker Studio. Go to the AWS console, SageMaker, and launch Studio. You should see something like this. If we click on JumpStart, we'll see a list of all the good stuff included here. It has models from different providers, built-in solutions, and more. Let's focus on Hugging Face for now. Click on Frameworks, Hugging Face, and now you see a curated list of Hugging Face models that we can literally one-click deploy inside our own AWS account. It's a small number—280 models right now. On the hub, we have probably 350k models. These are the reference architectures, the baseline versions for all those reference architectures. You won't find the fine-tuned variants, which you can find on the Hub, but generally, when you want to experiment, it's a good idea to start from the official model that the model builders have shared. You can see all the latest and greatest models here, including Stable Falcon, the large one, Llama, Bloom, T5, and even the new model from Mistral, a 7B model that outperforms Llama 13B. This came out just a few days ago and is already on JumpStart. If you want to try Mistral, this is probably the simplest way. Let's try this one.
Click on the model, and we can just click on deploy. We'll check the deployment configuration. We can run this on a G5 instance, which I certainly have quota for. Let's just deploy. This will deploy the model on a SageMaker endpoint, a managed endpoint on this G5 instance. All we have to do is wait for up to 10 minutes, and we'll be able to open a sample notebook and start playing with the model. As you can see, you don't need to do anything except click. This is based on the work we do together with AWS, building the Hugging Face deep learning containers and integrating them into the SageMaker SDKs. One click, wait for a few minutes, and you can start experimenting. I'll pause the video and be back when the endpoint is ready.
After a few minutes, the endpoint is in service, as we can see here. Now we can just open the notebook. We need an environment for this, which could take a minute. Let's not wait; I'll just pause. Now we have a kernel to run this notebook, and we can just click through those cells. As you can see, we're grabbing the endpoint name we just created, writing a function to query the endpoint, and then seeing some examples. This is super useful because you can see what the inference format and prompting format look like. You can certainly go and reuse that code in your own notebooks. Then, of course, we can start querying. What's the recipe for mayonnaise? Looks like a good answer to me. I'm going to Paris; what should I see? We're providing some additional context. Let's try this and run the other ones too. The Eiffel Tower, etc. In Bash, how do I list all text files in the current directory? That's a good one. We get some Unix shell commands out of that. More technical question: what's the difference between in-order and pre-order traversal for binary trees? We get some output and a code sample. That's pretty cool. You can obviously ask different questions to this, and the notebook is a great resource.
As you can see, the total time from opening Studio to running commands in the notebook is literally 10 minutes. So experimentation time is really 10 minutes. You can add your data, and since it's running in AWS, you can pull data from S3 or any other data store you use and start experimenting. This would be a huge time saver. No need to manage infrastructure or write code; just start figuring out if this model is a good starting point for the project. Once you're done, please delete the endpoint. Otherwise, the endpoint we just created will stay there for a while. Click on delete. You can check here, deployments, endpoints; it's gone. Then just go and shut down Studio, and this will close everything and terminate all the resources, so you stop paying. Super simple: open Studio, go to JumpStart, find the model you like, one click to deployment, one click to open the notebook, experiment, and figure out if the model is a good place to start. Then, if it is, probably go with the SageMaker SDK to fine-tune it, iterate on it, etc., and deploy it in production. But for experimentation, I think JumpStart is just by far the simplest option. That's what I wanted to show you today. I hope that was useful. Much more content coming. Until then, keep progging.