Hi, everybody. This is Julien from Arcee. In a previous video, I showed you how to use AWS Trainium, a custom training accelerator designed by AWS to speed up transformer training. There was quite a bit of setup to do, particularly installing the so-called Neuron SDK to support the Trainium chip and additional dependencies. To simplify all of this, we've built a new Amazon machine image, AMI, that's freely available on the AWS Marketplace. So in this video, I'm going to show you how to launch a Trainium instance with this new AMI and how to start training your transformer immediately without any setup. OK, let's get to work.
Let's launch an instance. Just click here. Give it a name. Find our Arcee AMI in Marketplace AMIs. Yes. Okay. Select. Continue. Okay. Now we get to pick the instance type. So here we have the smaller one. Let's see if we can grab a large one, maybe. Add a key pair. Security group is fine. Storage is fine. Yeah, we can go and launch. All right, so in a minute or two, we should have an instance ready for us. So let's just wait for this and I'll be right back.
Okay, so I've connected to my instance. Let's clone the repo with my code here. Of course, I'll put all the links in the video description. Let's just go clone this thing. All right. The code is in this directory here. Okay. So I guess first things first, we can check that we see the neuron devices. Neuron LS is part of the neuron CLI. Yeah. And we see our 16 chips. Each one comes with two cores. So this is the same code I used in a previous video on Trainium. I am actually fine-tuning here, yes, BERT base on the Yelp full review data set, which is quite large, hundreds of thousands of local business reviews, and it's a pretty simple example. In the previous video, again, I'll put the link. But long story short, we just need to add this one line of code to our PyTorch training loop to support Trainium through PyTorch XLA. So very cool.
So that's the standalone version, and we have a distributed version which obviously is the one we want to run here because we have 16 chips and 32 cores, right? And this is, again, very close to vanilla PyTorch distributed training, again, through the XLA interface. Here we're just training on 10,000 samples, which is fine. I just want to highlight the fact that, hey, we can start this immediately and it's going to work. Okay. We'll run at scale in another video, but more on this later. Okay. So let's just fire this up. Oh, no, not like that. I got to use the port run command. And yes, I do want to disable tokenizer parallelism because I don't want to run tokenizer on each chip, right? Would be silly. And here I want to run 32 processes on this node. So I'll just say 32 here. And we should be good to go. No installation whatsoever, which is much better than what I had to do last time around on the deep learning AMI, which was all of this. Not complicated, but if you can do away with it, even better, right?
So this will start. This is the first run, so it's initializing all kinds of Python things, and that'll take a minute or two. Okay, I'll be right back when we start seeing output. All right, so once the model has been compiled, training can start, and we can see all the cores are happily busy training the model. Okay? Should be done in a couple of seconds. So there you go. This is really the simplest way to get started with AWS Trainium. Just grab our Arcee Neuron AMI and fire up a Trainium instance, the small one or the large one, and then you can run your code out of the box. No fuss, nothing to install. Okay, I think it's pretty cool. All right, well, that's it for this video. And of course, I'll put all the information in the description. And until next time, keep rocking.
Julien Simon is the Chief Evangelist at Arcee AI
, specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.