Deploying Arcee models on Amazon Bedrock

Transcript

Hi everybody, this is Julien from Arcee. I've got really good news for you. As of today, you can use Arcee models in Amazon Bedrock. This feature was launched yesterday at AWS re:Invent. In this video, I'm going to show you what models are available today, and that list will certainly grow over time. We'll quickly discuss what those models are, what they're good for, and then we'll deploy a couple of them, use them in the playground, and last but not least, I will show you how to use them in your applications with the Bedrock API. We've been waiting for this for a long time, and I know some of you have also been waiting a long time, so let's get started. In a previous video, I showed you how to deploy our models from the AWS Marketplace, so you can still do this. This page is still running, and we'll keep updating the models there. No problem. However, as of yesterday, there's an easier option thanks to this new feature announced in Amazon Bedrock called Marketplace Deployments. If you click on this and then on Model Catalog, you will see models coming from different providers, and Arcee is in that list. At the time of recording, we have five models: four open-source models—Nova, Spark, Supernova Lite, and Arcee-Lite—and a commercial model called Supernova, which I've mentioned quite a few times. All these models can be deployed in Bedrock, and the key benefit is that the deployment process is extremely simple. Additionally, you can use the models with all the additional Bedrock features like knowledge bases, prompt management, guardrails, etc., because the models are accessible through the Bedrock API. Let's talk quickly about the models. Supernova is our 70B commercial model based on the LAMA 3.170B architecture. I'll put all the links in the video description for more details. Let's talk about pricing for a second. Bedrock pricing for marketplace models comes in two dimensions: the first is the type of instance you're running the model on. As you can see, the models are running on SageMaker endpoints, so you get to pick which instance you want to run the model on based on price, expected performance, throughput, and latency. SageMaker costs apply, and if you have commercial agreements or reserved instances, those negotiated prices will apply. For commercial models, there is a software cost on top of the instance cost. Nova is an open-source model with 72 billion parameters based on the Qwen architecture. It's free to use, but remember, you will still be paying for the instance. Next, we have two LAMA models: LAMA Spark and LAMA 3.1 Supernova Lite. Both are based on the LAMA 3.1 8B architecture. Supernova Lite is the newer one, but I recommend you try both. They have slightly different properties, so if you're looking for a reasonably sized general-purpose model, Spark or Supernova Lite should do the trick. Lastly, we have Arcee-Lite, our smallest model at 1.5 billion parameters, based on the Qwen architecture. This is good for simpler applications or when you need a lot of throughput at very low cost. You can easily get 100 tokens per second on a very small GPU instance. We'll keep adding models, and we have a bunch of really exciting new models that we've just released. It will take a bit of time for them to find their way into Bedrock, but I'm on it. Now, let's deploy some models. We'll deploy the Supernova Lite model and the larger commercial Supernova. Let's start with the smaller one. Click on this, and because this is based on the marketplace, we need to subscribe. Click on View Subscription Options. You see the pricing for the model, which is zero, so no worries there. Let's just go and subscribe to this. We're subscribed. Good. Now we can go and deploy. Just click on this button. That's the endpoint name, number of instances. If you know you're going to have serious traffic, you can start with more instances. Then you get to pick the instance type on which you want to run the model. For now, we have G5 and G6, but that list will change over time. I'm still waiting for those G6 instances to be available on SageMaker, but for now, let's be reasonable and take maybe G5 to Excel, which is very cost-effective. If you want to set your own VPC, you can do this, set some tags, or ignore that. Let's just go and click on deploy. Now, let's deploy Supernova, and I'll show you a little trick here. Supernova says it's available for all those instance types, which we also see on the marketplace page, but if you click on the instance type list, you will only see G5 and G6. This is not a bug; it's because Bedrock only shows you the latest version of the marketplace package for Supernova. If we go back to the marketplace page, we'll see that we have different versions of the package: v1 is for P4 and P5 GPU instances, and v2 is for G5 and G6. To fit on G5 and G6, we use a quantized model. As Bedrock only shows the latest version of the package, you only get to see G5 and G6 in this list. Let's just go and deploy on G5 for now. If we wanted to deploy Supernova on P5, there's a good solution. You can register SageMaker endpoints in Bedrock. You would subscribe to the model on the marketplace page, deploy it on SageMaker with our sample notebooks, and once that endpoint is running, you can register it to Bedrock. I'll show you this in another video. Let's pause until our two endpoints are ready, and then we'll play with them in the playground and show you the API. After a few minutes, our models are in service, so we should be able to test them. Let's try Supernova Lite first. We should be able to open it in the playground. Go and select this. OK, and maybe look at the parameters here. Response length. Let's do more than that. Let's do 1024. OK, and we can ask a question. All right, we got our answer. Pretty cool. How do we use that model for real with an API? We just go back to our deployments. Here we have the endpoint ARN. You want to copy this and then go to a sample notebook like this one. Make sure we have the right ARN here. We grab the Bedrock Boto3 client, prepare an input with the OpenAI format, and you can still pass parameters. Then we just invoke. This time, I'm going to be using the streaming API. Voila, right? Pretty good speed for our model, even though we're only using a G5 2XL instance, which is a single GPU instance. Very effective. Let's try our other deployments, Supernova. Let's just go and open it in the playground. Yes, continue. Let's add more tokens to the output and maybe slightly different prompts. What did we get here? Imagine you have a big box full of different toys like cars, dolls, and blocks. Interesting. I explained like I'm 5, but 5 was a little low. But hey, let's try something else. That's definitely a more complex answer. Let's try this one in our notebook too, so we can see streaming happening. Let's go to this, grab the ARN, just update this thing here, and let's stream the answer here. You can see, even though we are working with a 70B model, it's still very fast on this instance because we quantized it. The trade-off is we probably lose a little bit of model quality in the process, but we're able to run that 70B model on a cost-effective instance. So that's how you do it. Pretty cool, pretty simple. Of course, when you're done, please don't forget to delete your deployments because you will be charged for as long as they're running. Let's just go and delete this one and then go and delete the other one. That's pretty much what I wanted to show you in this video. Our models are in Bedrock. You can one-click deploy them, open the playground, use the API, and connect them to all the other Bedrock services available here. I'm going to do more videos to show you how we can register an existing SageMaker endpoint and how to connect some of those Bedrock features to our models. Stay tuned; much more is coming. Until then, my friend, thanks for watching, and you know what to do—keep rocking.

Deploying Arcee models on Amazon Bedrock

Transcript

Tags

About the Author