SageMaker Fridays Season 3 Episode 1 The complete ML lifecycle with Amazon SageMaker
February 28, 2021
Broadcasted live on 26/02/2021. Join us for more episodes at https://amazonsagemakerfridays.splashthat.com/
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️
This 90-minute special is the perfect starting point for SageMaker beginners and experienced users alike. After a quick introduction to SageMaker, we walk you through the 9 Sagemaker launches from AWS re:Invent 2020: what they are, what problems they solve, and a quick demo.
* SageMaker Data Wrangler: data preparation
* SageMaker Clarify: bias detection and model explainability
* SageMaker Feature store: offline and online storage for your engineered features
* SageMaker JumpStart: one-click deployment for ML solutions and pre-trained models
* SageMaker Data Parallelism: optimize large scale distributed training jobs
* SageMaker Model Parallelism: automatically split and train large models on a GPU clusters
* Profiling capability in SageMaker Debugger: collect and visualize training performance metrics with no code change
* SageMaker Pipelines: automate model deployment end-to-end with quality gates
* SageMaker Edge Manager: manage multiple ML models on edge devices
Transcript
Good morning, everybody, and welcome to season three of SageMaker Fridays. My name is Julien, and I'm a principal developer advocate focusing on AI and machine learning. Before we explain what SageMaker Fridays are about, please meet my co-presenter. Hi, everyone. My name is Ségolène, and I'm a senior data scientist working with the AWS Machine Learning Solution Lab. My role is to help customers get their projects on the right track to create business value as fast as possible. Great. Thank you again for being with us. We'll definitely need your expertise. So this new season of SageMaker Fridays will continue with the hands-on approach that we all enjoyed in season two. Twice a month, we'll focus on real-life machine learning use cases and solve them using Amazon SageMaker and the new capabilities introduced just a few months ago at re:Invent. As always, no slides and lots of discussions and demos. All episodes are live, so please feel free to ask all your questions in the chat, and our friendly moderators will answer them. Okay, remember there are no CD questions, so make sure you learn as much as possible. Okay, let's get started. So Ségolène, what is this episode about?
In this episode, Julien, we are going to give you a grand tour of all the new SageMaker capabilities launched at AWS re:Invent, the premier cloud conference in the world. In the upcoming episodes, we will focus on a particular aspect of the ML lifecycle, such as data preparation, training models, and so on. Okay, so it's kind of a recap today. Starting next week, we'll start diving into particular things, but we need to set the scene first. Okay, so before we do that, before we talk about the new capabilities, we should start with a very quick recap on SageMaker and the story so far. A lot of you are probably new to the topic, so we need to take a few minutes to bring you up to speed. So Amazon SageMaker was launched three years ago. It's a fully managed service that helps developers and data scientists quickly and easily go from experimentation to production using the same set of tools and the same Python SDK. The fully managed part is very important because it means you can focus on the machine learning problem and never have to worry about infrastructure. So you never have to worry about instances and servers, managing them, or scaling them; all that stuff is taken care of automatically. The only thing you have to do is pick the Amazon EC2 instance type that you want to work with—maybe it's a CPU instance, maybe it's a GPU instance. We'll talk about that later as well. So again, all the plumbing is abstracted away, and you can focus on your problem and build a great machine learning model. And as usual with AWS, you only pay for what you use. So it's always pay-as-you-go.
When it comes to the machine learning process, you have full control. You can train and deploy models based on a collection of scalable building algorithms, some of which have been invented by Amazon. You can also bring your own code written with open-source libraries like TensorFlow or PyTorch and rely on your optimized version to get the best performance. Or another option is to bring your own custom code, for example, some custom Python. It's really important to understand that all training and deployment activities on SageMaker are based on Docker containers. Don't worry if you're not familiar with Docker; you really don't need to know much about Docker to use SageMaker. As long as your custom code fits in a Docker container, you can run it on SageMaker. If you use TensorFlow or PyTorch, you don't even need to worry about the containers; you just use the ones we provide. But if you want to use R, we can easily build an R container. We have examples, and you can run that on SageMaker. We have more options for modeling. You can also use a really nice capability called SageMaker Autopilot, which is an AutoML capability that builds models for regression and classification tasks. It supports a variety of algorithms, including neural networks now. The really nice thing about Autopilot is that you don't just get an optimized model. You also get auto-generated notebooks that show you how the model was trained, how the model was optimized with hyperparameter optimization, and how data was prepared. So feature engineering and data prep are fully transparent. You can take those notebooks, run them yourself, tweak them, and understand exactly how the model was built.
And I really love Autopilot, and it's recently added support for deep learning algorithms as well. Yes, that's super important. And there is one more option we haven't mentioned. You can also visit the AWS Marketplace for Machine Learning, which hosts hundreds of algorithms and pre-trained models built by AWS partners and deploy them on SageMaker in just a few weeks. So what about developing? We talked about building, building, deploying models. How do we work with SageMaker? How do we write code for SageMaker? From the point of view of practice, from the development perspective, you can use the SageMaker SDK, your favorite IDE, which is a Python SDK. Exactly. And SageMaker also provides a full-fledged IDE called SageMaker Studio, where you can run your notebooks, visualize your training jobs, manage your prediction endpoints, and of course, work with SageMaker advanced capabilities like automatic model tuning, model monitoring, and so on. Okay, and we'll use Studio today, so you'll get a good look at Studio. So speaking of advanced capabilities, I think it's time to start the new season for real. We need to introduce all the new SageMaker characters, just like in our favorite TV show. So we have nine new characters. Let's quickly name them and then we can start diving into them. The first character is SageMaker Data Wrangler for data preparation. We've got another character, SageMaker Clarify for bias detection and model explainability. We've got SageMaker Feature Store, offline and online storage for your engineered features. We've got SageMaker JumpStart, one-click deployment for ML solutions and pre-trained models. We've got SageMaker Data Parallelism for optimizing large-scale distributed training jobs. We've got SageMaker Model Parallelism to automatically split and train large models on GPU clusters. If you want to profile capability in SageMaker, you can now collect and visualize training performance metrics with no changes. Yes, that's a really nice one. We're going to help you automate the model deployment with end-to-end model deployment and quality gates. And last but not least, we've got the Edge Manager to manage multiple models on edge devices. Okay, so today we're going to try and go through all nine. Right. So that's a bit of a challenge, but we think we can manage that. And starting next week, we'll start zooming in. So it's the start of the new season. I'm afraid I had no time to compose a musical theme. Should I sing anything? No, no. Okay, so I'm not singing anything. All right, maybe next time. So let's get started. And we'll go in logical order. So we should start with data preparation. Data preparation is a huge part of machine learning projects, and we hear from customers that data prep takes 50% to 80% of their time. Why is that, and what are the problems that customers typically face?
You're right, Julien. For data scientists like me, data prep is very time-consuming. Another problem is figuring out if your datasets potentially contain biased data. Bias is a growing concern for social, ethical, and regulatory reasons and cannot be ignored anymore. No one wants to build unfair models and applications. We want machine learning to make a difference and be positive. If we build models that discriminate against certain users or are just generally not good, that's awful. So how does SageMaker help? This is where our first character comes in: SageMaker Clarify. SageMaker Clarify is a new capability that lets you compute pre-training bias metrics on your datasets and training metrics on your trained models. Everything runs on managed infrastructure, and you can visualize the results. Let's look at an example. I'm using SageMaker Studio here. As you can see, it's a web-based IDE, and I'm opening some notebooks. So what Clarify lets you do is compute bias metrics on the data and models. There's also a model explainability capability, which we won't discuss today but will cover later. It's really simple to use. You can see the code here, and I'm zooming in on the exact snippet that's important. So it's managed infrastructure. We just create a SageMaker Clarify processor object, saying, "Please run my analysis on MLC for xlarge, which is a mid-size CPU instance." We could pick bigger ones if we wanted. Then we configure bias detection, which really says, "I need to configure the dataset. Where's the dataset? What's the name of the label for that dataset? What are the names of the columns?" So basically, describe the dataset. And if you want to do a post-training analysis on the model, you just need to say, "Here's the model I want to analyze." We already trained the model previously. And here's the infrastructure I want you to deploy the model on and run the analysis on. So this is as much infrastructure as you deal with, which is awesome. Then you need to specify the configuration and say, "I want you to look for potential bias in this dataset with respect to a certain feature." It's called a facet. In this case, I want to look for bias on an attribute of the feature called sex, which is basically male or female, and we want to see what's going on. And don't worry, we'll come back to this in detail. But this is really saying, "Here's the sensitive attribute in the dataset that I want to look at." And then you just run the analysis with your processor. And that's it. You say, "Please compute all the pre-training metrics. Please compute all the post-training metrics." And that job runs. It runs on managed infrastructure. We automatically deploy the model, compute all the metrics, and here we see the notebook with the output for all those metrics. We can also look at them in SageMaker Studio in a much user-friendly way, which we'll do next week. So we get all the metrics, and we can get information on how to integrate them. Now, of course, you get a report, a notebook, a PDF file, an HTML file, and they are in Amazon S3, a storage service, and you can just go in. And that's it. So you can see it's very little code. Configure your setup and just run your analysis on managed infrastructure. And then interpret the metrics. So that's really, really nice. We'll dive deep into this one next week. Don't miss it.
So this is what I can tell you about Clarify. To summarize a little bit, Clarify can make it easy to determine and collect potential bias issues early. You can then investigate if these biases are real, what their business impact is, and so on. Then you can take action to do bias correction, such as rebalancing the training sample, using smooth algorithms, adjusting levels on the training datasets, adjusting cutoffs, adding more data, and so on. We'll actually do this next week. We'll show you how to do rebalancing and so on. I'm still working on the code. It's almost ready. Okay. So now we get a better understanding of this initial problem. Can we please talk about data preparation now?
Okay, so data preparation. What are the tasks typically involved in data preparation? Now we are going to talk about the second character of the new season, which is SageMaker Data Wrangler. On top of cleaning data, you often need to transform it into a more expressive form that makes it easier for the algorithm to learn. This whole process is called feature engineering and usually involves a lot of manual work with a collection of tools. We all use our own scripts and open-source tools, and they're great, but we're trying to make this whole process a little bit simpler. And I think this is where Data Wrangler fits, right? Trying to unify and standardize your data cleaning and preparation process. Exactly. You see the SageMaker Data Wrangler when I use an effective UI in SageMaker Studio. Thanks to that, you will be able to import tabular data and apply over 300 built-in transforms as well as your own in Python, PySpark, and Spark. For instance, Loti uses Data Wrangler to accelerate the process of machine learning data preparation. This helps their customers take their new products to market much quicker. It makes sense because data prep can take 80% of your time. Any speedup here is a huge improvement in your project, and you can iterate much quicker. Let's take a look at my screen and see Data Wrangler in action. Again, going back to Studio, the first step is to actually import data. Here, I'm going to import from Amazon S3 and grab something from one of my buckets. I'm going to quickly show you an example with the Titanic survivor dataset, which is a toy dataset but a good one. It's a CSV file. When I select it, you can see I get a preview. I can import this dataset and start preparing it or analyze it first. We could build histograms, scatter plots, and a table summary. We can run the typical stats and build the typical histograms and graphs that we like to build. And then we could start adding transforms. Here's the list. We can use custom code as well—Pandas, PySpark, Python, and SQL. Maybe I want to drop a column. The name column is probably not super important. So I say, "Drop column name," I can preview the change, and add it to the pipeline. Then you could encode a categorical variable, maybe one-hot encoding, on sex, and output style, one column per value, and give a prefix. We can preview and see those two categories here and add it, etc. We'll look at this in more detail next week. We can see the different steps involved in our pipeline. We could have multiple sources, etc. And then we could just export this pipeline to actual code. We could run this, take those steps, and export them to a Python notebook, a SageMaker processing job, etc. There are really lots of ways you can work with this and integrate this code into your own application. You can see the preparation step is really, really simple. It's about manually and visually adding transforms and then exporting the code and using that. So that's what I can tell you about Data Wrangler in a few minutes. After the exportation, as we saw, it takes one click. You have four options: export to Python code for direct integration into your project, a SageMaker processing notebook for fully managed batch processing, a SageMaker pipelines notebook for end-to-end automation, and SageMaker Feature Store for offline and online use.
This is a good transition because the next service we want to talk about is SageMaker Feature Store, another important part of data preparation. When we talk about feature engineering, the same feature engineering code is often run again and again, wasting time and compute resources. In large organizations, this can cause an even greater loss of productivity as different teams often run identical jobs or write duplicate feature engineering code because they have no knowledge of prior work. Another problem is that it's imperative to apply the same transformation to data for prediction. This often means rewriting feature engineering code, sometimes in a different language, integrating it into your prediction workflow, and running it at prediction time. This whole process is not only time-consuming but can also introduce inconsistencies, as even the smallest variation in a data transform can have a large impact on prediction. So how does SageMaker Feature Store help solve this? SageMaker Feature Store is a fully managed, centralized repository for your ML features, making it easy to securely store and retrieve features without managing any infrastructure. Features are organized in groups and tagged with metadata, and you can discover which features are already available. For instance, Intuit no longer has to maintain multiple feature repositories across the organization. Instead, their data scientists can use existing features from a shared private store. One-star features can be retrieved and used in SageMaker workflows, such as model training, batch transform, and real-time prediction with low latency. The key is that you avoid duplicating work and build consistent workflows that use the same consistent feature store for offline and online use. Run feature engineering once, store the features, and then use them for training and prediction. And share them with all the teams.
Let's see how this works. Let's go back to my screen and put on my demo glasses. While Ségolène was talking, I exported my data preparation pipeline on the Titanic dataset to this notebook. Again, we'll come back to SageMaker Feature Store in future episodes, but I'm giving you a quick sense of the capability here. We're going to create what we call a feature group. A feature group is an abstraction in the feature store where we store rows with the engineered features. Think of it as the equivalent of your SQL table or CSV file, but in engineered format. We give a schema with the right types for all the columns. You can see the features that are actually engineered here. We have this auto-generated for us, but you could also write your own schema if you were not exporting from Data Wrangler. We need to pass the name of a unique identifier for each row, which could be a primary key or something that comes from your dataset and uniquely identifies each row because that's what you're going to use to query the features for a certain row. Then we decide to create an online store. That's the low-latency store we use at prediction time to retrieve features. We also have the offline store, which is based in S3, where we'll get all our features and where we can query to build training datasets. We create the feature group with this API call, and that's about it. Then you can start ingesting your data. You can bulk ingest your data or put individual records. You can retrieve them for online prediction or run queries on the S3 objects in the offline store to build training datasets. We'll show you a demo of that later in the season. But really, it all comes down to creating a feature group, defining a schema for it, deciding if you want online storage, and calling this API to ingest the data, which is one line of code later in the notebook. It is really, really super simple. You can also create a feature group in the Studio UI. Just go to this icon, select Feature Store, and create a feature group. You'll find the same thing I just described. But if you're more of a UI person, that's okay because you have a UI. The feature group name, enabling online and offline storage, passing the same parameters, and then passing your schema. That's about it. So it's very, very simple. And then you can query and retrieve features and get to work. Okay, so very useful.
Now we're going to training, right? Sorry. Building. Yeah, because I'm so excited about distributed training. That's right. Sorry. We'll talk about training, too. We're talking about building models now, as discussed in detail in season two. You can go and watch those episodes as well. They're online. You have lots of options to build ML models: built-in algorithms, built-in frameworks, your own code in a Docker container, SageMaker Autopilot, SageMaker Machine Learning Marketplace. So seriously, SageMaker, isn't this enough? Do we really need more? Yes. Okay. All right. Tell me more. Indeed, SageMaker has a lot of options. However, some customers who don't have a lot of ML experience may find it difficult to train and deploy state-of-the-art models on SageMaker. Fair enough. Also, experienced practitioners may want to quickly experiment with different models, train on reference datasets, shortlist a few, and fine-tune them on their own data. If you've ever tried to deploy models like BERT, my favorite, or one of the many variants of BERT, you certainly wasted some time trying to deploy, predict, and got frustrated. So can you help me here? Yes. Not me, but SageMaker JumpStart. This is exactly the problem that SageMaker JumpStart addresses. In just one click, you can deploy hundreds of state-of-the-art models for computer vision and natural language processing. Once the prediction endpoint is up, you can use the prediction code provided by SageMaker and immediately predict with your own samples. There's no more frustration trying to figure out the data format expected by the endpoints. SageMaker JumpStart also lets you deploy end-to-end solutions that solve specific business problems, like fraud detection in financial transactions or handwriting recognition. Just one click, and AWS CloudFormation will deploy the appropriate AWS resources. Within a few minutes, you can start running the sample numbers. So I love this. This is exactly what I need. I keep saying laziness is a virtue, and we're going to look at some examples. Let's go to my screen. This is the home screen for SageMaker JumpStart. You can see we have solutions, text models, NLP models, computer vision models, and more examples for SageMaker algos, sample notebooks, blogs, and video tutorials. We'll focus on the top three here. If you open solutions, you can see there are solutions available. If you select one, such as the detect malicious users and transactions solution, you can see the story, the architecture, and the CloudFormation template. We used this example in the last season, so it's cool to see it's part of SageMaker Studio now. You can go and watch the episode on fraud detection. There's a description of what we do and what the dataset is. One click, and you launch the solution. It runs the CloudFormation template. You wait for a few minutes, and then you get to a screen that says, "Ready to go." Click on "Open Notebook," and you automatically open the notebook that shows you the actual code and what this use case is about and how it solves the problem with SageMaker. This is a great resource to learn. If you're new to machine learning and SageMaker, I cannot recommend enough that you look at all the solutions. It's probably the best way to learn, run the code, and understand how it works. I keep learning stuff when I use this. It's a good starting point. If you have this particular business problem, this is a good place to start. And you can tweak it and add your data, etc. So actually, a lot of the solutions have multiple notebooks: a basic one and a more advanced one. So again, very, very nice resource here.
SageMaker JumpStart also has models. If we go back to JumpStart, we see plenty of NLP models and computer vision models. Let's take a look at that. Here, let's start with an NLP model. I selected a BERT variant trained for question answering. This particular model will find the answer in a piece of text that you provide. You give it a bit of text, ask a question about that text, and it finds the answer. You can deploy them as is, and as you explained, they are pre-trained. Or you can fine-tune them, which is great because fine-tuning those complex state-of-the-art models is not easy if you try to do it manually. You can bring your own data or use a default dataset or enter a location for your own data. I deployed the retrain model. Click on this, wait a few minutes, and then you have an endpoint, a SageMaker endpoint, which is in service. Once again, click on "Open Notebook," and guess what? You open a sample notebook again. For a lazy guy like me, this is great because I get some samples that I can try. What is Southern California, how abbreviated it is? And this is the bit of text that has the answer, and "Spectre," which is a long movie. Of course, you get the text, and the answer is somewhere in there. And we get to see the invocation code. This is typically where I waste a lot of time trying to figure out how to invoke the endpoint, what data format, and what format the answer is. Those models tend to have complex prediction responses, really difficult to understand. So not so much here, but this is all taken care of. I can run this example on my code, on my samples, and I can see here's the question, and the answer is SoCal. So that's the magic of those models. They give you the perfect answer to the question. Who directed "Spectre" is Sam Mendes, so it's a perfect, laser-focused answer. I guess this is why BERT is a popular model these days; it works very well. The only thing I've done here is click on "Deploy Model," and I can literally take that code, tweak it, and it's fine. Here's another example with a computer vision model, a single-shot detector object detection model. Again, I could deploy it or fine-tune it. There's always a bit of extra explanation here if you want to read. I deployed it. It's in service. Open the notebook. And once again, I get a sample image. Hopefully, you're in a place that's not in lockdown or curfew. If you're in a place like this right now, we're so jealous. We're so jealous. And then we can send the image to the endpoint. Typically, SSDs have complex prediction responses because they have all the bounding boxes, and it's really difficult to write that code yourself. But here, it's already provided. We have the code sample that actually decodes the response and draws the bounding boxes on the test image. You can literally cut and paste that code and use it in your lab. This is a huge time-saver. I'm a big fan of JumpStart. So that's what I wanted to tell you. Honestly, I really like your statement because, as we see, it's really an easy way to deploy and try out complex models and solutions. Now, I think it's time to move on to the next step of training.
So we are going to talk about training, how to train a model. Training is, you know, we tend to think it's kind of a solved problem, but it's only solved a little bit because as datasets and models get ever bigger, training or even fine-tuning them continues to be a challenge. Of course, infrastructure is more and more powerful, and we keep innovating on infrastructure, but it's not enough. You need scalable software to help you train on that cool infrastructure. And of course, SageMaker has had distributed training from day one. For me, it was actually one of the most interesting things because setting up distributed training on your own is not so easy. For SageMaker, it was just one line, and it was so easy. It uses open-source libraries like Horovod and PyTorch Distributed, which I'm sure you work with. A few months ago, we added new libraries to speed up distributed training using an improved technique for data parallelism. If you're not familiar with data parallelism, it's pretty easy to understand. If you have a large dataset and, let's say, 16 GPUs in your cluster, you split your dataset into 16 chunks and send one chunk to each GPU. Each GPU is training on a fraction of the dataset, and things will go faster because you don't train the full dataset on each GPU. There are many ways to implement it, but as you can imagine, if you have a large dataset, it's important that communication between those GPUs is super optimized because it's a huge bottleneck. This is what the new data parallelism library does. It can scale to datasets that are hundreds or even thousands of gigabytes—petabytes. We have customers who do this, trust me. It's available for TensorFlow and PyTorch. In a nutshell, it implements a super-effective way to distribute computation across the training cluster. By optimizing network communication, you can fully utilize our fastest GPU instances like the P3 family and the P4 family. What you want is for the GPUs to be crunching data. You don't want the GPUs to be transferring or waiting for data to be transferred; that's a waste. We eliminate a lot of those transfers and focus the GPUs on what they do best, which is crunching. Thanks to this, we get near-linear scaling, regardless of the number of GPUs involved. If you double the number of GPUs in your cluster, you'll pretty much get twice the speed. You don't have to trade off on training costs and training time because any extra hardware you add is going to be used efficiently. It's money well spent. Let me give you a concrete example. Last year at re:Invent, we trained Mask R-CNN, a general framework for object instance segmentation on ImageNet, in 26 minutes in PyTorch and 27 minutes on TensorFlow. We recorded the fastest training time to date with 6 minutes and 12 seconds on TensorFlow and 6 minutes and 45 seconds on PyTorch. Six minutes to train Mask R-CNN. Instead of 26. Wow. I did that. So honestly, this is thanks to the new data parallelism library. Let's take a sneak peek at an example using this library. Again, it doesn't require a lot of code change. Let's show my screen and take a look at a simple example. We'll cover this in more detail later. So basically, what you need to do to use this new data parallelism library is to add a parameter to your TensorFlow estimator. The estimator in the SageMaker SDK is the object you use to configure your training. If you've never seen SageMaker before, well, you can just read this, and it makes sense. We're using Python 3.7 and TensorFlow 2.3.1, and we want to train on two pretty large P3 instances. The code that we want to use to train the model is in this script. All you have to do is add this parameter, saying, "Please enable the data parallel library." Even I can do that. Very simple. In your own training code, you need to define a function and annotate it with this annotation. This function is really the function that runs forward propagation and computes the loss. And that's about it. It computes the gradient and returns the loss value. This is a very generic way of writing your code. The reason you have to do this is that this is the function that actually gets distributed. This is the function that gets distributed to the GPU. So you just need to annotate it to tell the library, "This is where you need to split the data and optimize communication." It's pretty simple. Again, we have sample notebooks, and we'll come back to this. But it's literally a couple of lines you need to add to your script, and you can enjoy this very efficient distributed data parallelism framework. We'll come back to this with a real-life example.
Now, there is another one, and this one is even crazier. I love this one. What about very large models? When I say large, I mean things like large transformers like T5-3B, which, as the name implies, has 3 billion parameters. Models like GPT-2, GPT-3, and they're even bigger. They're so large that they won't fit into GPU memory. Even if you have the largest GPU memory available today, they just don't fit. If you want to train them, you have to split them manually. You have to train one part of the model on one GPU and another part on another GPU. It sounds like madness to do this, like split your model and train different parts on different GPUs. Or there are other techniques like gradient checkpoints, which are pretty costly, saving your gradients to disk and then training another part of the model and saving that to disk again. Do we have a better option in SageMaker? Yes, I like to. Yes, yes, yes, of course. SageMaker has a model parallelism library. It automatically and efficiently partitions models across several GPUs, eliminating the need for accuracy compromise or complex manual work. Thanks to this scale-out approach to model training, not only can you work with very large models without any memory bottlenecks, but you can also leverage a large number of smaller and more cost-effective GPUs. It's supported for TensorFlow and PyTorch and only requires minimal changes in your code. Let's show my screen and see how you can configure this. This time, it's a PyTorch example. In your estimator, you have to pass an extra parameter to enable model parallelism. We have extra parameters here, which I won't go into, but we'll cover them later in the season. They drive the level of parallelism you want in your training cluster, like how many times you split the model. But these are not so difficult to come up with. This is it. You actually do not need to change anything in your code in this case. It sounds like magic because how could you think that, oh, automatically, this is going to split my model and run forward propagation for these layers on GPU one and the rest of the layers on GPU two? The way this works, as we'll see later in the season, is when you fire up this training job, there's a profiling step that runs early on, looking at the model, how many layers it has, how big they are, how much memory they require, etc. Then it makes partitioning decisions and starts to allocate certain layers on certain GPUs. That's all you need to do. This is really, to me, completely magical. Here it's a very simple example, but we'll try to show you a really big example later in the season, and then you'll see how this works and how we can train really, really large models. So this is pretty cool.
So with model parallelism and data parallelism, it looks like we have our scaling covered for a little while. But there's another problem. What about understanding performance, training performance problems? Sometimes you run your training job, and it's slow. It's very hard to figure out why. Is there any hope to understand how well or how bad the training job is running and how well or how bad it's using our infrastructure? You're right. It's a very hard problem to understand how well or how bad the training was. But of course, SageMaker sends monitoring information to Amazon CloudWatch, our monitoring service, which is integrated with pretty much every AWS service. So we get to see some infrastructure metrics. But sometimes this is not enough to identify and fix code-level training issues. For this purpose, we added a new profiling capability to Amazon SageMaker, which is Amazon Debugger. Debugger was actually launched over a year ago to figure out training issues related to convergence, etc. Now you can also profile performance. And I love it so much because it doesn't require any changes to your code. You can take your existing training code and profile it without changing anything. Profiling is now available for TensorFlow and PyTorch. All you have to do is train with the corresponding PD frameworks in SageMaker, and distributed training is also supported out of the box. Setting a single parameter in your SageMaker estimator and without any changes to your training code, you can enable the collection of infrastructure and model metrics such as CPU and GPUs, RAM and GPU RAM, network I/O, storage I/O, Python metrics, data loading time, time spent in ML operators running on CPU and GPU, distributed training metrics for overlap, and many more. It's really much more than the graphs we had in CloudWatch so far. Particularly the operator-level information is very cool. This is really where you can fix and see how the sun was. In addition, you can visualize how much time is spent in different phases such as preprocessing, training loop, and post-processing if needed. You can drill down on each training step and even on each function in your training script. No code changes, no. Let's look at an example. Let's take a look at my screen. This is an example with PyTorch. Just like for SageMaker Debugger, trying to figure out debugging actual training issues, the profiler also uses rules. We have built-in rules like low GPU utilization and a few more. If we're looking for specific problems, we can enable those rules. If they're detected during the training job, we get notifications via Amazon CloudWatch events, which is the event service associated with CloudWatch. These can be sent to a queue or a Lambda function, and we could act on it, like killing the job if GPU utilization is 1%, saving time and money. We can configure some rules looking for specific problems. Then we configure the actual profiling job. We set the data capture interval to 500 milliseconds. We can go down to 100 milliseconds, but it will be a lot of data. If you really want to know what's going on, that's useful. We could capture data for specific steps, training steps. Maybe you don't want to capture data for your whole job, just a specific section. Why not? And then, as usual, you configure your estimator and pass that profile. You start your training job, and that's it. Data is collected and stored in S3 in near real-time. As the training job is running, you get data in S3, profiling information. You can write your own code to access that data, but you can also see it in Studio. This is the summary. You would only get to see the summary at the end of the job. It gives you information about usage, percentiles, the most time-consuming, compute-intensive operations. In this computer vision example, convolution is the top operator, taking almost 60-70% of the time. We see insights, and the low GPU utilization rule that we configured was actually triggered quite a few times. We get extra information on what this means and how to fix it. The batch size rule was triggered as well, etc. This really points at very precise performance issues and gives you extra information on how to solve them. If you look at nodes, I'm going to try and reload this, but it won't load. Here, this is where you actually see the real-time information and how the nodes, CPU, and GPU usage over time, etc. You can download a full report as well. So I think this is what I wanted to tell you about the profiler. We'll try to run something big on this and see how it happens, showing you real-time information. It's pretty cool.
We mentioned Clarify early on. There's a model explainability feature in Clarify. What you can do with this is use Clarify to understand how your model works. This is based on the well-known library called SHAP. We covered this in season two. It computes SHAP values for your dataset, displays a summary in Studio, and you get individual SHAP values for each sample in the dataset in S3 as well. This is part of understanding how the model works and something you would want to do in training as well. Pretty important. So I think now we're done with training. Quite a lot of stuff about training, right? Remember model parallelism, data parallelism, profiling performance issues, and model explainability. It's time to move to the last part of this presentation, which is model deployment. We are going to talk about our next character, which is SageMaker Pipelines. The next step for any machine learning project after training is, of course, deploying your model in production. The new SageMaker Pipelines is really the cherry on the cake as it brings best-in-class DevOps practices to your ML projects. This new capability makes it easier and easier for data scientists and ML developers to create automated and reliable end-to-end ML pipelines. As usual, all infrastructure is fully managed and doesn't require any work on your side. DevOps, the DevOps philosophy, the DevOps mindset, and concepts apply to machine learning. It's probably the longest pain point for customers today. Training is pretty well understood now, and the tools to run and scale training are pretty well. Deployment is still painful. So this is what people call MLOps. They love inventing new buzzwords, so we call it MLOps, and that's fine. Thanks to SageMaker Pipelines, data science and MLOps teams can collaborate using familiar tools and processes for the well-known continuous integration and continuous delivery (CI/CD). SageMaker Pipelines is made of three main components. The first one, pipelines, which include any operation available in Amazon SageMaker, such as data preparation, model training, model deployment, real-time and batch transform. The second component is the model registry, which lets you track and catalog your models. The last part of SageMaker Pipelines is MLOps templates, which include a collection of built-in CI/CD templates and AWS Service Catalog for popular pipelines. This is a super feature. We're going to explain it to you in a few minutes. We'll spend a little more time on this one because it's important. Let's take a look at my screen and see how we can use the templates to quickly build, provision a build, train, and deploy pipeline. Then we'll look at the actual code that data scientists would write to automate the steps you mentioned, like data prep, training, evaluation, etc. And then we'll simulate a deployment in production. So let's say you're the data scientist. You're working in Studio. You have your workflow. You train your model. You test it, etc. And then you want to deploy it. But I'm the production guy, and I want to make sure it works. So there's a quality gate, and you don't have permission to deploy it. You just train your model, and then I check it and make sure all the boxes have been ticked. Only then will I say, "Yes, this can go on." Here, you can see this in this part of Studio, under projects. If you click on "Create Project," you can see the existing templates, the built-in templates that we provide, but your organization could add different templates with different configurations and use those. I selected the build, train, and deploy template. It has sample code associated with it. I can show you this. Yes. Once the project has been created, we have sample repositories. We actually have two. We have one, which would be the data scientist repo with your code. This is this one here. There's some scaffolding code, and this is the actual code you would write. I think I have it here already. Yes. This is the code you would write. In this case, it's training a simple XGBoost model on the Iris dataset. We see the different steps. There's a processing step, and I'm guessing there's feature engineering on this dataset. Using this pipelines SDK, we create a processing step based on SageMaker processing. Then we train the model. We configure the estimator with XGBoost, set hyperparameters, and plug all that stuff into the training step with the location of the training dataset and validation dataset. Then we have an evaluation step to evaluate model accuracy, which is very typical. Then you just register the model. Oh, there's a conditional step. Sorry, I went a little bit too fast. There's a conditional step to say if the accuracy of the model is higher than a certain value, it's a good model. I want to register it. I want to make it eligible for deployment. We see the conditional step and the registration step. Then we build a pipeline. We have all those steps in order: processing, training, evaluation, and then the conditional step, which either registers the model or doesn't. This is the code you would write. You can see it's super simple and logical. It's how you work. You use this SDK for pipelines with, you know, you can see here, say, `drinker` and `workflow.start`. You write that stuff, and then there's boilerplate code that just runs everything here. This is all plugged into our developer services and is part of the templates. As a data scientist, you wouldn't have to set this up. Your ops team would set that up. We can see all this is actually plugged into this repository with your code and your pipeline. This is configured as the source step for the code pipeline. Every time you commit to your repository, it's going to trigger this. It's going to build your model, but it's not going to deploy it because you can't. In your project, you have those reports, and you have your models. Every time you commit to your repo, it's going to trigger that code pipeline that you saw. Once your model has been trained, it's registered in this case because it was a good model, but it stays in this state called pending. We set the model state to pending because you could say, "I want to check that this model is a good model." I could actually grab the model, run my own testing, and see what's going on. I could inspect this in detail and say, "OK, it's a good model." Then I update it and say it's approved or maybe rejected. If I look at my pipeline now, the build pipelines or let's call it the data science pipeline ran about three hours ago. The fact that I just approved the model just now says, "Okay, now I'm deploying." I see that. Yes, so I am actually deploying this. This page is pending here. So I can see I'm actually deploying. And maybe I actually let you deploy and approve your model. You can run your own tests and say, "Okay, I approve the model. It's good to go." And then it could trigger the rest of the pipeline. But because I'm a very suspicious person, there's a manual approval stage. You could say, "Okay, I'm going to really, really look at this now." It's like, "All right, this is a really good model." Thank you again. Right. Your MLOps is perfect. And this will trigger the actual deployment. In this case, you can figure out any team can figure out what works for them. Do you want to let your MLOps deploy in an AWS account or maybe a staging account? It's okay. You can let them deploy and auto-approve their models in Studio. But when you get to fraud deployment, which probably means deploying to another AWS account, then you need to go to this quality gate and this manual approval. So I really like pipelines because it works for data scientists. You can actually go and deploy your models on your own in a controlled environment. Then you say, "OK, this is really good. Now I want to actually deploy it for real." But you can still have a quality gate in CloudFormation. So the data scientists can work in Jupyter and Studio with the tools they know, and the ops team can work with CloudFormation and CodePipeline, which are probably going to, you know, okay. And so the two can collaborate and do a good job. And you can see now we're deploying. Again, these are built-in templates. They're very simple, but you can have more elaborate ones where you deploy cross-account, etc. Okay. All right. So this is pipelines. We'll come back to this in great detail and try to run all of this again. It's super cool.
But for some customers, it's not what they know because some customers want to deploy, for example, lots of computer vision applications at the edge. A few years ago, we launched a service with model deployment capabilities at the edge called AWS IoT Greengrass. We also launched a capability called SageMaker Neo, which would make it quite easy to compile models for particular hardware architectures for performance improvement. So that's the state of the edge on AWS. Anything new at re:Invent this year? Guess what? Yes. Why am I asking? It's right. So actually, the SageMaker Edge Manager was launched during the last re:Invent. Let me tell you a little bit about SageMaker Edge Manager. Starting from a model that you've trained or imported in Amazon SageMaker, SageMaker Edge Manager first optimizes it for your edge platform using SageMaker Neo to convert it to an efficient format that can be executed on the device by the low-footprint runtime. Then SageMaker Edge Manager packages the model and stores it in S3, where it can be deployed to your device. In fact, you can deploy multiple models on your edge device, and they are managed by an agent that communicates with the AWS cloud for model deployment and with your application for model management. You can integrate these agents with your application so that it may automatically load and unload models according to your prediction requests. This enables a variety of scenarios, such as freeing resources for a large model when needed or working with a collection of smaller models that cohabit in memory. That's pretty nice because sometimes you want different models to do different things, like computer vision, looking for different types of objects. Very specialized models usually work better than one very general model. You can load and unload them automatically. Based on the prediction request that you receive, the agent will load and unload the model, which would be super painful to do yourself. Unfortunately, we can't really give an interesting demo in just a few minutes, so we'll try to come back to this later in the season. We're almost out of time, so we have a few more minutes for questions. Please make sure you ask all your questions. Okay. I think it's time to wrap up. So what did we talk about today? You have 30 seconds. I'm going to redo the list. No, we're not very. Now, what you did during this first episode of the third season of SageMaker Fridays, we introduced the nine new SageMaker capabilities launched at AWS re:Invent 2020. They are: Data Wrangler for data prep, Clarify for bias detection, Feature Store for online and offline storage of your features, JumpStart for one-click solutions and models, Data Parallelism to scale to super large datasets with very high training efficiency, Model Parallelism to automatically split those crazy large models across your cluster, the profiling capability in SageMaker Debugger to figure out how efficient your training job is and understand what potential problems are in there and how to fix them, SageMaker Pipelines for end-to-end deployment, and Edge Manager for multiple model management on edge devices. There are tons of resources, and we're not going to list them right now because we'll come back to particular things you should read and watch in future episodes. Until the next episode, you can go back to the SageMaker Fridays page to watch the season one and season two episodes. We obviously have a lot of launch blog posts. If you go to the AWS News blog and look for SageMaker blogs, you'll find everything we discussed today. We'll come back to particular things you should be reading in the next weeks. Okay. So by the way, next week. Yeah. Next week, not two weeks from now. Next week, we're going to dive into data analysis and preparation. We'll use Data Wrangler and SageMaker Clarify and spend quite a bit of time on both. We just gave you a taste today, but we're going to go heavy into those two things next week. I hope to see you all there. Ségolène, thank you very much for your insights. It was a pleasure to start this new season again. And thank you, everyone, for watching this. Thanks to all our colleagues involved in setting this up. We really appreciate it. It was an absolute pleasure to spend those 90 minutes with you. Yeah, we're exhausted. I hope you're not. But anyway, it's time for the weekend, right? So have a great weekend, everybody. We'll see you next week. And until then, keep rocking with machine learning.
Julien Simon is the Chief Evangelist at Arcee AI
, specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.