Developer resources to deploy and test Arcee models on AWS

Transcript

Hi everybody, this is Julien from Arcee. In this video, I would like to share developer resources that I've built to make it very easy to deploy our open source models on AWS. We'll look at some sample notebooks and CloudFormation templates with a variety of configurations, and in just a few clicks, you'll be able to deploy any one of our open source models in your AWS account. Let's get started. As we've discussed before, Arcee has published a number of open source models on the Hugging Face Hub, and I encourage you to go and check out our page. I will put all the links in the video description, and you can see some models. Of course, you can visit the model page, clone the repo, and use those models with the Transformers library. Most of the time, you will want to deploy this in the cloud, and for a lot of you, that means on AWS. That's why I built some developer resources to make that process as painless and straightforward as possible. All my code lives in a public GitHub repo. It's called arcee.ai.awssamples. What will you find in here? Let's start with, I guess, the simplest way, which is model notebooks. Here, you'll find a growing list of Jupyter notebooks that basically let you deploy one of our open source models on Amazon SageMaker. Maybe let's just open Arcee Lite, a nice 1.5b model. It is a Jupyter notebook, and if you just click through these, you'll be able to deploy them, downloading from the Hugging Face Hub on your instance of choice. I've tried to use the most cost-effective option every single time. For Arcee Lite, we've got G5XLarge, which is really inexpensive. Those configurations are known to work, so you don't need to mess around too much with sequence length and whatnot. These work; I've tested them. Maybe you can push the limit a little bit, but at least those will work out of the box. In all the notebooks, I also have enabled the OpenAI Messages API, which makes it very simple to run inference with the models. Just run the OpenAI input, and if you have existing OpenAI prompts, they should work right away. The rest is really just SageMaker as you've seen it 200 times on this channel—creating an endpoint, etc. You can see the OpenAI format here, and I've got a bunch of different examples. Feel free to add your own. Of course, never forget to delete the endpoint at the end to avoid unnecessary charges. So that's the first simple way to deploy models, straight from the hub using the model ID. That's option number one. The second way to deploy our model is through the AWS Marketplace. These are based on model packages. Model packages are just a SageMaker artifact that combines an inference container and the model artifact and model settings, etc., so it's all packaged in one single object, and you can deploy from there. You need to start from the Marketplace page; those notebooks will not work unless you have subscribed to our models on the Marketplace. So, just go to the Marketplace page, and you'll see some of our models in there. Maybe look at Llama Spark, and you just need to subscribe to it. Click in there; that's all there is. You will receive an email confirming subscription. The open source models are free, so no surprises there. You will just pay for the underlying infrastructure. Once you have subscribed, you could go and deploy with the built-in features in the Marketplace. You could try CloudFormation, deploy in the SageMaker console, or use the AWS CLI, but I would recommend running my notebooks because those have known, proven, tested configurations. Once you've subscribed, just clone that AWS samples repo and open the notebook for the model you just subscribed to. Let's look at Arcee Agent. It's very similar to the previous notebooks, except here, you need to use the ARN of the model package that you just subscribed to. You don't need to tweak this. I've entered them all, and you can see they're available in 16 AWS regions. So you don't need to change anything. Run those cells, and we will automatically select the model package for your region, then create the endpoint. The rest is very much the same. Of course, for production and automation, CloudFormation is the best option. You will find CloudFormation templates to deploy models. Because I am an extremist, I guess, I considered three different configurations. First, deploying from the Hugging Face Hub. Then deploying from a model package that you could have created yourself. These are not the Marketplace packages, and we use those scripts internally, so I thought, why not share them too? If you do create your own model package, you can use one of those. You could also deploy from a model artifact stored in an S3 bucket. The end result will always be the same: a SageMaker endpoint running in your account. You could be deploying from the Hugging Face Hub, from a model package, or from an S3 artifact. Because not one size and not one inference server fits everyone, you have files for Hugging Face TGI and for the LMI container by AWS, which uses the digital serving. That should cover all your needs. As always, please read the templates to make sure they work for you before you run them blindly. You shouldn't need to tweak much in there, and that's a good way to deploy models. I guess maybe I'll show you some demos in another video, but here I just wanted to highlight this repo and how to easily deploy Arcee models on SageMaker. I will update the list, so keep an eye on this repo. Watch it, and I would appreciate a star, why not? I'll keep adding more models to this. That's pretty much what I wanted to tell you. So as you can see, it's very easy to deploy Arcee models on AWS, on SageMaker, and you can do it in many different ways. Maybe there will be more in the future. Who knows? All right, that's it for me today. Until next time, keep rocking.

Developer resources to deploy and test Arcee models on AWS

Transcript

Tags

About the Author