SageMaker Fridays Season 2 Episode 2 Demand Forecasting October 2020
October 19, 2020
Broadcasted live on 16/10/2020. Join us for more episodes at https://amazonsagemakerfridays.splashthat.com/
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️
This project provides an end-to-end solution for Demand Forecasting using a new state-of-the-art Deep Learning model LSTNet available in GluonTS and Amazon SageMaker.
Transcript
I'm going to go ahead and get some more information about the next one. Thank you. Music. I'm going to go ahead and get started. Hey, good morning, everyone. Welcome to episode two of SageMaker. My name is Julien. I'm a principal developer advocate working on AI and machine learning. And just like last week, I have a co-presenter. Hello, everyone. My name is Ségolène. I'm a senior data scientist working with AWS Machine Learning Solution Lab. My role is to help customers get their data and ML project on the right track to create business value as fast as possible. All right. Thank you. It's great to have you again. So if you missed last week's episode, you can watch it on Twitch. It's available on demand. And just like all the upcoming episodes, we have four more after this. So all episodes are live. We're actually in the Paris office, both of us. And you can ask questions. So we have moderators waiting for you, and they're ML experts, and they're super friendly. So if you have questions, please go ahead. We're just here to help you learn. That's the purpose. And there are no silly questions. Don't be shy. Make sure you learn as much as possible.
Okay, let's get started with episode two. So, last week we spoke about predictive maintenance, and we trained a handmade model on a dataset, and we used a convolutional LSTM, etc. Time series is actually a very popular use case for machine learning, and it's a very important use case. Time series data is all around us. So something tells me we're going to continue exploring that today. Yeah? Yeah, absolutely. So in this second episode, Julien, we are going to talk about demand forecasting, another very popular use case for machine learning. Demand forecasting deals with predicting customer demand for goods or services to optimize production and supply chain. You don't want to produce too much, which is wasteful, or too little, which can cause a disruption. So you want to produce the right amount. Okay, get it just right. Yeah, of course, I think pretty much every company can use that kind of knowledge to be more efficient and serve their customers better. Manufacturing goods, retail, stocking the right amount of inventory, agriculture, producing the right amount of a certain product. So you could even, if you talk about IT infrastructure, predict incoming traffic to a web platform and scale. You're right, Julien. And in fact, Amazon EC2 has a predictive auto-scaling feature. Oh, yeah, right. Yes, which I encourage you to look at.
But back to our business. Today, our purpose will be to predict electricity consumption for individual consumers. Starting from a multivariate time series dataset built by the University of California, we are going to train a model based on the state-of-the-art, the SOTA, LSTNet architecture to predict a new data sample and visualize results. Okay, so that sounds super nice. Now, state-of-the-art model sounds a little bit scary and complicated. Every time someone says state-of-the-art model, you have to go and read a research paper with lots of math and equations and then try to implement it yourself. So, do we have to do this? No, of course not. The math is very scary when you look at some research papers, but in our case, you won't have to understand how to do the math and code it because Amazon teams have already implemented it in GluonTS, the package. And so GluonTS is an open-source library for time series implemented on top of Apache MXNet. Okay. And you will see later, we will simply download the model and use it directly without all the equations. Just one line of code, get the model. Oh, yes. Exactly, one line of code. Okay, hopefully, we'll still talk about the architecture. I just don't want to get too deep into every single line of code. It gets very crazy and really quick. Okay, so I like that simplicity. And of course, we're going to use SageMaker. So we're going to use some capabilities that we already used last week, like training a model, but we're going to also introduce some new capabilities. For example, we're going to preprocess the dataset. Last week, you explained that time series needs some preprocessing, sometimes very complex. And we'll use SageMaker processing, which is a capability of SageMaker specialized for batch jobs related to machine learning projects, just like preprocessing data. We'll also deploy to a real-time endpoint. Last week, I think we used batch prediction. This time, we'll use real-time prediction. And a few more things here and there. This is going to be another pretty intense episode. I'll give you 10 seconds to get some coffee or energy drinks or candy bars, anything you need to bring your blood sugar levels up. Yeah, I need some energy as well. And get started. So while you're doing that, let me show you. I'm going to share my screen for a second. And we can see the repo here. Okay. So, here it is. This is the GitHub repo we're going to use. It's an AWS Labs repo called SageMaker Deep Demand Forecast. Okay. So you can follow along if you want to. You can just clone the repo and then run the notebook. Or, of course, you can go and try that later. Okay. All right.
So now that you have coffee and everything you need, let's talk about the machine learning problem. Again, we'll get to the code. But you need to understand what the problem is, how we're going to solve it, why this state-of-the-art model is a good idea, how it works, etc. Let's go back to basics. What is the problem, the business problem we're trying to solve? So first, I'm going to tell you something. All models are wrong, but some are useful. Oh, I like that. Okay, mine are wrong, for sure. Yours are useful. Yes, exactly. And this quote, "All models are wrong, but some are useful," is not from me, but from one of the time series fathers, George Box. And this is exactly what we are going to do today when we do demand forecasting. We want to find some useful models on top of our historical time series data to help streamline the supply chain and supply-demand decision-making process across businesses. Okay, I like that quote. And we actually will see that when you're trying to predict demand, there is no right or wrong answer. In some cases, you want to over-predict a little bit to avoid disruption, and in some cases, you want to under-predict a bit for other reasons. So we'll give you some examples. So I think that's why time series and demand forecasting are really interesting because you get a lot of options once you've trained the model. You get a lot of options on how you're going to use it and what kind of prediction you're expecting from it.
Okay, so we mentioned this is a popular problem to solve for companies, right? So I'm thinking everyone watching this today can easily find some kind of demand forecasting scenario for their organization. Yes, time series are everywhere. Most of the customers I work with at the ML Solution Lab want to do some forecasts because they really want to minimize risk and avoid uncertainty due to the fact that you never know what the future will be. And we talk about random walk and stochastic processes when we model time series. And we can have a lot of examples and applications of time series forecasting, such as product sales, cloud server usage, electricity consumption, customer representative. And I can teach you a ton of examples because, yeah, time series are everywhere. So actually, I remember a couple of years ago, I did a session at re:Invent, our technical conference, which, by the way, is happening again at the end of November and it's free this year. It's all online and free, so make sure you join that. And I had a customer on stage called Advanced Microgrid Solutions, and they work in the energy sector and are actually trying to do at a much larger scale what we're doing today. They're trying to predict energy consumption. So let me quickly show you. I'm not going to play the video, so on my screen, you can see the use case slide from that talk. And you can easily find the video on YouTube. Just look for Advanced Microgrid Solutions, re:Invent. And so the use case we're describing is predicting supply and demand for the energy market in Australia. And of course, you have energy producers, you have energy consumers, and it's a spot market. So they need to predict the right price every five minutes for energy. And five minutes feels like a lot, like predictions are instant, right? Real-time predictions. But in fact, there is a lot happening during that five-minute time window, and they have implemented a really clever solution. This is a very, very good session. It's probably one of the deepest ML sessions I've done on stage with a customer, thanks to the customer, not thanks to me. They use TensorFlow for this and go quite deep on the architecture layer. They use convolution networks with very specific settings and have a custom loss function. I mean, what's not to like? So it's a really fun session. Again, if you're into those topics, I really, really recommend it.
So last week we discussed why deep learning was a good fit for those problems, right? Instead of traditional learning. So, without repeating ourselves too much, can you summarize why we should use deep learning for this? Yes, in the case of electricity consumption, of course, you could predict for each individual's consumption by using an univariate ARIMA model on each time series. Or you could apply some VAR models, some vector-autoregressive models, on a group of 10 people. That can work. You can use statistical analysis of maybe one or ten people. But if you want to forecast for 100, 1000, or even 1 million individuals, the linear approach is going to be super painful. Because first, the computational cost of ARIMA models, when you do an ARIMA model, you need to follow the Box-Jenkins procedure for each time series. So you need to look at the ACF, autocorrelation function, PCF partial autocorrelation function, check the linearity, the coefficient, the significance of the linear coefficient. You need to do some strong linear assumptions about the region. And after, if you put a VAR model, again a vector autoregressive model, on a bunch of time series, you can have a big problem of overfitting, and you won't, of course, to avoid overfitting. So this is the reason why, again, when you have a lot of data and when you want to predict a lot, it's a better idea to use deep learning.
Okay. So that's what we're going to do today. Yeah. Okay. And we're going to use that LSTNet algo, but let's get to that in a minute. Let's look at the dataset. Okay. So let me show you on my screen what the data looks like. Are you ready? It's not nice. Okay. It's JSON. As we all know, you cannot kill JSON, and here it is again. So, the dataset that we have is a slightly processed version of the University of California dataset. I think we have 321 time series with a one-hour frequency. Exactly. So we've got 321 individuals that we are going to study. Okay. And we have about, I think, 12,000 data points. So it's about 2.5 years of data. One point per individual per hour. Okay. So it's a lot, right? Yeah, it's good. It's enough, probably. It's enough. Because we have 2.5 years. So we have two full years of consuming electricity in the summer, in the winter, the food pattern. And we can use assumptions that we are going to have some seasonal patterns over the year. So it's interesting to have at least two years. So when it comes to the dataset itself, you can see we have 321 lines like this one, which is really just a bunch of numbers. It's the power consumption during that hour, and every data point is the next hour. Okay. So we start at January 1st, 2012, and then we have pretty much about 12,000 successive values for that customer. Right. And if I scroll some more, we should see, yeah, it's a lot of values, right? But trust me, there are other... Oh, here we are. Okay, here's customer number two. And yeah, we have 321 just like that. Okay? In JSON format that everybody loves so much. So, how much processing do we need here? It looks like it's...
Yeah, but again, this kind of dataset has research purposes, so the data here is quite clean. But most of the time, data preprocessing, when you deal with real-world values, the data preprocessing and feature engineering is a super important component of the ML lifecycle. I don't know if you know the CRISP-DM method? Ah, yeah, in my nightmares, yeah. It's like, yeah, the cross-industries... No, seriously, it's important. Cross-industries standard process for data mining, and it is an open standard process model. And yes, data preprocessing and data feature engineering is crucial, but it can take a lot of time. And we are going to see how SageMaker can help us do this data preprocessing job with SageMaker Processing, so we're not going to do much. Let me show you the script here. In our case, the only thing we're doing here is normalizing the time series. Okay, because as you can see, we have some different scales. You know, maybe this customer is consuming tons of electricity. So consumption values are really high. And actually, they look quite higher than the first customer. So we want to normalize that. And the decision made here is for each individual time series to find the max value and then normalize against that between zero and one. You want to compare at the same scale. Exactly. To have the same scale. So that's what we're doing here. There's a little bit of code. It's pretty much the only thing it does. It finds the max value in each time series, normalizes, and does that for all time series. One very important thing you have to do most of the time with time series is to look at the missing values and sometimes do some imputation, etc. But I don't think we have missing values here. Yeah, you're right. It's clean and it doesn't miss anything. So it's a really simple cleaning script. And the interesting bit is how do we run it, right? How do we process data with it? So we mentioned we would be using SageMaker Processing, and that's what we're doing. So we'll look at the code in a few minutes after we've talked about the model. But in a nutshell, what SageMaker Processing does is it lets you run batch jobs on fully managed infrastructure. So we saw last week how SageMaker lets you train, deploy on managed infrastructure. SageMaker processing follows the same rationale. Just bring your script, just like we're doing here, write your script, test it on your local machine, and then when it's ready to be run on the full dataset, run in production, you just need to adapt it a little bit for SageMaker processing, and that pretty much means where's the dataset and where do I write the processed dataset. So you just need to be able to get those paths from environment variables that are set by SageMaker preprocessing, and that's about it. So it's super simple if you have existing cleaning code; you can very easily adapt it. And it's super important when you've got some big data, a lot of data, if you can automate your preprocessing stuff. And you can run those on demand, on managed infrastructure, a thousand times a day if you need to. And as you will see, it's really one line of code in the notebook. So you can run Python, and of course, you could install any libraries in there if you wanted to, and you can run also on PySpark. So we're going to use Python here. But the takeaway here is if you add existing code to clean, process, etc., the only thing you need to do is add command line arguments to receive the location of the raw dataset and the location of the processed dataset. That's it. You can literally copy-paste those things from one script to the next, and you're done. Okay, so that's really, really cool. Processing is a very nice thing. Now, as you would expect, this runs inside a container, just like everything in SageMaker. And we have, like I said, we have built-in containers for scikit-learn, Python, of course, and PySpark. Or anything, any code. And here we use something else. We use GluonTS APIs, etc. So we're going to build our own container. And that sounds scary. That doesn't scare me. But yeah, it does sound scary. It's actually very simple. So if you don't know much about Docker, you can learn the basics of Docker in a couple of hours. And the first thing you need to do is write a Dockerfile, right? And this is a very simple one. We start from an existing Python 3.7 image, okay? And we just install some dependencies using pip, right? That file installs pandas, MXNet, and GluonTS, which we use in the script. So anyone can write that or copy, paste, and modify that. It's not going to scare you. And then we need to build and push the container in the location that SageMaker can use. And that location is an AWS service called Amazon ECR, Elastic Container Registry. You can see it here. I've already done it, right? So, it's your private Docker registry with all the good things you like in AWS, security, high availability, etc. And it's a good idea to keep your containers close to where you need them anyway. And, of course, you do this using standard Docker commands. So, just for the sake of completion, let's quickly look at the script. It's included in the repo. You can just run the script in the notebook. But come on, we're all engineers here, and we want to understand how things work. So first, I need to figure out my AWS account number because I need that in the name of the ECR image that I'm going to push. So just a CLI call to STS to get my account. This would be the account number. Then build an image for the container I'm going to create. So account number dot docker dot ECR dot region name dot blah, blah, blah, slash an image name, which I will pass. This could be anything. And then the latest tag. Then I create a repository in ECR. You just saw it. Okay, that's this one here. If it already exists, obviously, I don't create it. Then I log in to this repository using my AWS credentials. Okay. All right, so by now, I have this repo and I can access it. And then I use the standard Docker commands to build the content, tag it with the name I created, and push it to the repo. And this is completely vanilla, right? I mean, you'll see this script in plenty of SageMaker examples. It's always the same. It's completely generic. So even if you don't want to learn Docker at all, you can just copy-paste this, right? That's the good news. Okay. So that's what we're going to do for our SageMaker processing job. Again, if we use scikit-learn or if we use PySpark, we could just ignore this and use the existing container and just load and run our script. So that's SageMaker processing. It's a really, really cool capability. You can use it for processing, you can use it for model evaluation post-training. If you want to do that, you can use it for plenty of different things and get rid of bespoke infrastructure and tools. Right? Okay. So I like SageMaker processing quite a lot. Okay. So we talked about data. We talked about processing and processing it with SageMaker processing. Now let's talk about the model, the really cool thing, the really cool bit. Okay. So what are we using today? So we are going to use the LSTNet model. Let me display the star so that we know what we're talking about. This is it. Yes, yes, yes. So in the LSTNet, what is a cool aspect is that you can, of course, read the arXiv paper if you want to know and understand. I read it, and it's not too bad. If you skip the math a little bit, if you assume it's right, which of course it is, because these great authors are smarter than I am, but it's actually quite readable, I think. Yeah, yeah, and when you look at this schema on the screen here, LSTNet is going to take the best from the two worlds, from the world of CNN and the world of RNN. So again, kind of similar notions that we used last week. But the LSTNet compared to the vanilla LSTM we saw last week, you've got two new things. You're going to have some skip layers in this LSTNet plus an autoregressive bypass which are added.
So if you saw last week's episode, we do see... So we have the time series as inputs, and we have the convolution layers to extract patterns. And then we have the recurrent layers to find the time element in the time series. And I think one of the really clever things are those skip connections. So skip connections are direct connections between neurons of different layers. And this is a really, I think it's widely used for computer vision as well, right? I think ResNet has a lot of skip connections. And if I understand it right, the idea is that because if you have long sequences, here we're predicting over long sequences, it's difficult for the LSTM to remember what happened 24 hours ago, 48 hours ago, etc. And the skip layer is going to be able to capture the very long-term dependence. So it's like a shortcut. It's a shortcut between layers. So if you want to know my electricity consumption in the next hour, the intuition is, look at my electricity consumption yesterday at the same hour or maybe the day before, right? So by skipping 24 hours into the future, literally, then we can improve the model. I think it's a very, very cool idea. And what about the autoregressive part? Tell us a little bit about that. My first reaction was, but it's typically what we don't want to use, right? Because this is what ARIMA and the other classical algorithms are used on? Yeah, but after this, in reality, what's going to happen is that your output is going to be decomposed between the output, non-linear output from the neural network part. Exactly, and after you're going to keep another aggressive part to keep this notion because sometimes the scale is going to be violent, so you need to keep this kind of autoregressive aspect of the time series. Okay, so this is a linear model, right? Exactly. This is a linear model. So if the scale is very different, it's going to help us figure that out. Exactly. And then we kind of sum everything and output the prediction. Okay, so again, pretty cool paper. I encourage you to read it. I think it's very well written and understandable, even for people like me. All right, so now, how about the code, right? This is scary. This is scary to implement. It's beautiful. Oh, yes, I'm sure it is. Plenty of colors. So let's look at the training script. So you would expect lots of complicated things here, but what we're really doing in the training script... So we have a training function that takes a long list of hyperparameters. We'll talk about those. And the training code is actually just this. So build a hyperparameter dictionary, grab the LSTNet model from GluonTS passing those hyperparameters, and then train. So this is completely underwhelming, which is the way I like it. I think it's great. So this is it. This is the training script, and of course, here, the only thing we're doing in the main function is calling the train function and then saving the model. Maybe we can zoom in a bit more because I think... Yeah, let me zoom in a bit. And save the model and then evaluate the test set and save some metrics which we will look at. So it's a super simple training script. So for those of you who are interested and want to know more about LSTNet, of course, if you go to GluonTS, you'll find the LSTNet implementation, right? And okay, now you can, now you can, you need some aspirin here. Okay. All right. Yes. If you want to see how it's implemented and if you really need a headache, then read the research paper and look at the code in front of that. But yeah, kudos to the brave souls who implemented it. Me, I'm very happy to just use the model zoo and call this. Okay, so let's talk just quickly about the hyperparameters and then we'll look at the SageMaker script. So we see plenty of hyperparameters, and we have some values we're going to use in the notebook. How do we pick those things? So here, of course, two very important hyperparameters in time series are the prediction length and the context length. So the context length is the sliding window for the training, and the prediction is the prediction length. Okay, so we're using context length data points to predict prediction length data points. Exactly. And after, you've got different other hyperparameters, so the autoregressive window, the number of channels, the scaling, if you want to scale the data or not, the output activation. So in our case, I think it's like a sigmoid, but you can either use non-activation or determinant. Yes. And after, you can choose the number of epochs, the batch size, the learning rate, the decay. The usual stuff. And I think the research paper has some values that they already optimized. I think they mentioned using grid search. Yeah. And so this implementation here will use the hyperparameters from the research paper. But you could tweak it, right? Exactly. Automatic model tuning. And with your own data after, yes, you need to change them. Of course. But using what the authors recommend is probably a good idea. I don't know any better. Just maybe a quick word. We see some, of course, test set evaluation and metrics. Can you tell us just a little bit about some of the metrics we will look at in the script? So you're going to see we've got like a big table with all the different metrics, but in time series, what you're going to look at a lot is like the square root and after you're going to have a lot of flavors of the square root, and in this case, we are going to compare the MAPE, a symmetric MAPE, minimum percentage error, which is a very common metric widely used by our customers and practitioners in time series because the MAPE is something that can be really easily understood by non-technical guys. Okay, and we are going to use the MSE and some quantiles. So yes, we're going to talk about that. I think actually the best part in the notebook is the final step when we look at metrics and discuss quantiles. So get more coffee, guys, right, and it's gonna get crazy in a minute. Okay, before we start running the demo, so we mentioned we're going to use a real-time endpoint just to show you something different, but we could use batch prediction as well, right? Yeah, exactly, just to show you how to deploy an endpoint and how that works. Very, very simple. Okay, so now it's time for the demo again. This is the repo we're using, right? In case you didn't catch that earlier. And we have a nice notebook. And let's go through it. Yeah. Okay. So, of course, we need the SageMaker SDK. We need the Altair library for visualization. It's a good one, yeah? Yeah, it's very nice. And we need MXNet and GluonTS and Pandas to do that machine learning thing. Then, of course, import the SDK. I mentioned last week, if you guys are already in SageMaker, and in case you missed it, there was a major release during the summer, SDK v2. So make sure you use this. There are some changes, but not too bad. A few breaking things, but really renaming parameters mostly. Okay. Now we're going to grab the dataset. Okay. So it's actually hosted already in S3. So we can grab that dataset. It's that JSON file you looked at, right? You want to see it again? It's so nice. Yeah. Okay. So grab that. And of course, we need an S3 bucket for our SageMaker job. And the easiest way is to use the default bucket. But you could use any S3 bucket that you have access to, of course. So we're going to grab that data and copy it to our S3 bucket. Define some standard locations for the training data, the training output, and a few more things. So now we're ready to go. So the first thing is preprocessing. So, let's quickly cover those Docker steps again. Okay. So, we're running that script that you saw. Okay. This one. Let's put all those pieces together. Okay. Creating an ECR repo, building the image, pushing the image. Okay. So, that's what happens now. Okay, and we can see Docker building the image, pushing it. So by now, I have my Docker image here. It's all nice. I'm ready to process. So now, I can just run my processing job using this container. It is super simple. So create a script process from the SageMaker SDK, passing the container name for the image we just built, and selecting the infrastructure that you want. So here I'm using a simple container with a simple dataset, so one instance is fine. But if I wanted to do bigger things, maybe I would use Spark, and we could do distributed training, and the only thing we would say is, hey, fire me eight instances, and just say eight, and that's it, right? Okay, and then we just run that code, right, using that preprocessing code, using the container that we built, passing the location of the raw data and the location of the output, which is, remember those parameters we saw in the script. Let me show them again really quick. Right here. So that's the only thing, that's the interface between SageMaker processing and your code, literally. All right, so now, of course, we're firing up that C4 instance, pulling the container to the instance, running the code, right? And a few minutes later, I have my processed data. That's great. Right? And it looks like this, which is even more exciting. More JSON, except this time it's normalized value, right? Okay, right. So let's look at that, but at least we know the script worked. And we can download that processed data, etc. So now we have processed data. Now it's time to train. So these are the hyperparameters you were mentioning. Context length, 12. So we're using 12 hours worth of data to predict the next six. And then we have some crazy hyperparameters. And again, we use the values from the research paper. And if you want to train longer, you could increase the number of epochs. And if you want to try something else, just go and tweak. I tweaked a little bit and I found, because I didn't know what I was doing, but there are some relationships between those parameters. So I think you need to be able to divide the channels by the AR window. So it's not just any value, right? So, you know, I opened the wrong door and... You close it! Yes. Quickly close it. Just so you know. Just so you know. Being very honest with you. Okay? But I can't help but tweak, yeah? Okay. And now we're going to use our estimator. So, we use the MXNet estimator because we want to have MXNet and GluonTS pre-installed. We pass the location of the training script, the easy one, the one that just pulls the model from the model zoo and trains. So that's what we're running here. And again, you can see the SageMaker part is really the easy bit. How much time we spend talking about the real problem and the dataset and processing it and the LSTNet. Once you have all those parts figured out, writing the notebook is easy, right? It should be, right? And the SDK, the SageMaker SDK should be easy to use, and I really think it is. This time we train on a GPU instance because deep learning is accelerated by that. And we passed hyperparameters and a couple more things, where to save the model, I mean, nothing fancy. And just like last week, we see the training logs of SageMaker automatically firing up that P3 instance, pulling the container, copying the data. That's so great, honestly. Just go and have more coffee while this is running. It runs for about 10 or 11 minutes with five epochs. And so when you're done, well, you have your model and you have the metrics that we saw. So here we see actually lots of different attempts, but I think you would only see one model, an artifact, and one output. Yeah, exactly. They keep piling up different attempts in the same prefix. So that's why you see more files. OK? Yes, a lot. All right. But yeah, you would just see a small number. Yeah. So we grab that output tar file, which contains the metrics. OK. And in case you forgot, this comes from the training scripts, right, which also evaluates the test set and computes lots of metrics and saves them to the file. Okay. So now let's look at the metrics, because at the end of the day, we want to know if this is a good model or not, if we should train for longer or not. So we extract the metrics, we load them with pandas, and this is what we see. Okay, so we have... 321 lines because we have 321 time series. So each line here in the data frame is a more specific time series. And we have lots of metrics. So let's start with the beginning, right, with those, and then we'll talk about the quantile craziness. Okay, my favorite part. Okay, Ségolène, please help us understand this. Thank you. No, so all the metrics here, in a nutshell, what they are going to compare, they are going to compare the predicted versus the actual value of your prediction. And after the idea is that you're going to have some value, if you have like some scaling problem or something like that, you're gonna use a different type of matrix, but after for me from my daily life as a data scientist working with time series, it's really the MAPE I use a lot. So, this one, this one, and after I use this notion of quantile loss and quantile function. Again, yes, absolutely, it's like you want to minimize the error of your model and comparing the predicting versus the actual. In order to just minimize the risk. Okay, so yeah, if you want to know exactly what those things are, how they're computed, you'll find that on Wikipedia. The research paper does a good job at explaining that. But these are basically variants over root mean square error. Exactly. Better, improved, more generic versions of RMSE. So now we have to talk about quantiles. Let me show you what we're talking about here. This is a time series. It's an ugly one, but it's a time series. And let's say that purple line is what we predict. Okay? So, we can see in some cases we're overpredicting quite a lot. Yes? So, you could say, well, that's a problem because if we're trying to predict how many pairs of shoes we need to stock in the inventory, we're going to stock way too many, right? And here we're underpredicting. So, you know, we're not going to stock enough. So, in one case, maybe we're wasting our money building inventory that's not needed, and here we're going to run out, and there will be lots of disappointed customers. But in some cases, you actually want to be either very conservative, right? If you want to have too much because you don't want to run out, the business impact of running out will be really terrible, and sometimes you don't want to have too much because maybe you don't have enough resources anyway, so you want to under-allocate. Or let's say you're buying, you're stocking perishable food, you know, it's probably better to run out than to have to throw away everything because you didn't sell it, right? So there are reasons why you would actually want some safety margin, so to speak. Okay? The problem is, if your model predicts one single value, it's very hard to decide. It's like, okay, you should stock 920 pairs of shoes. How do you agree on that number? So, instead of predicting single values, a good solution is to predict intervals, what quantiles are. So if you knew with, let's say, 80% confidence that you're going to sell between 200 pairs of shoes and 300 pairs of shoes, then you can manage the risk. If you're a very conservative company, then you could say, OK, let's stock a lot or let's understock a bit. But at least you have a range of values, and you can tell your business owners, okay, here's the trade-off, instead of giving them a single value. So let's define what a quantile is. So they're called P something, and that something is the actual quantile. So the PXX quantile is a value that tells you what percentage of your real values will be lower than this. So, for example, if you ask your algo to generate P95 values, then this tells you that 95% of values are lower than this. So it's a very conservative high prediction. Risk-averse. If you have the P5 or P05 quantile, then it means only 5% of values will be lower than this and 95% will be higher. So if you want to be sure that you never over-allocate, then P05 is a good option. But the real use is when you combine them, right? So if you ask your algorithm to generate P5 and P95, then you have a channel which is 90% of possible outcomes. So you can go to your business stakeholders and say, well, the decision you should make is pick a value between these, the low value and the high value, and these are 90% of outcomes. So depending on how conservative or how risk-taking you are, then you can go and say, okay, let's use P70. Exactly. Right? So we'll predict and base our decisions on P70. Okay? So that's what the quantile is. Right? I hope it makes sense. It's a very important concept. There's another value that's important. It's quantile loss. Right? So explain us quantile loss. No, no, it's in the quantile loss. The quantile loss is the average error percentage between the true value and the quantile value. Okay. So that means if this is P95, then I will have a prediction, right? My predictions will be the P95 quantile. And I also get a sense of how far off the real value is from that quantile value, right? So that's another reason because if you have a very conservative quantile, but the actual error is huge, right, then maybe you could take a P80 instead, so be a little less conservative and spend your resources better, right? So it's a trade-off between, you know, how much risk and how much risk you want to take and how many resources you have. So, for example, and we'll close it at that, if for your model, the P20 quantile is 1.23, right? Yeah, the true value is lower than 1.23, 20% of the time. Yeah, so that means your values will be lower 20% of the times. Okay. If the quantile loss for P20 is 0.123, then it means, on average, of course, that the true value is 12% plus or minus percent away from the quantile. So you know how accurate that quantile is. So you can decide if you want P5, P50, P95. These are not very intuitive variables and definitions, but they are really important. So, read them again and you'll figure it out. Okay, and this is what we see here. So, we see in those statistics, we see quantile loss for P50. Okay, so for this time series, right, the quantile loss, if I use P50 for prediction, the quantile loss is about 30%. Exactly. Right. So this time series is 30%. The real value for this time series is 30% away from P50. And we have all the other ones, right from P50 to P90. So looking at those values, you know, you can bring options to your business stakeholders and say, hey, what's the cost of over-allocating? What's the cost of under-allocating? So let's find the quantiles, maybe the two or three quantiles that make sense. And then looking at the arrows, find the one that is actually reasonably close to the value. Right. So again, great business decisions can happen here. OK, let's quickly visualize a few things. I think we need to wrap up. Yeah, we have a few more minutes, just a couple of minutes. Okay, so maybe just this one. Yeah, the other ones are probably less interesting. So this is just a scatterplot. Yeah, a scatterplot. Yeah, between the MSE and the symmetric MAPE. So each dot is actually a time series. Exactly, yeah, and you can compare. So we see lots of really good ones, low errors on both metrics, a few other outliers. So you need to check. Maybe it's just bad data, maybe they're just weird, etc., etc. Okay. And there are a few more graphs, but yeah, we're going to repeat ourselves, right? So we can deploy the model just like that, one line of code. One line of code. There's not much to be said. Deploy and you get a real-time API. And we'll talk about that in more depth in future episodes. Okay. All right. I think we're pretty much done. So let me quickly show you some extra resources that you can use. Okay. So you can send your feedback there. Let me display the slide. Take a screenshot. That's the slide you want to keep. So take a screenshot here. So that's the Advanced Microgrid video. Very cool. SageMaker, AWS blog, the GitHub repos for the SageMaker SDK, SageMaker examples, the repo we use today. There's actually a companion blog post. That's very nice. It gives some extra background. That's the URL for SageMaker Fridays, but you're here today, so I'm guessing you've figured that out. re:Invent and discount codes for my SageMaker book valid until November 11th, so don't wait and yeah, I'll leave this one just to give you some time to take a screenshot and get the discount code. All right, okay, well, I think we're done, so thank you very much. Let us go back full screen for a second. Okay, here we are. So thank you very much. I hope you learned a lot again today, Ségolène, yeah, I think I can talk about that for hours, but now yes, we have been on how to deep to train and deploy SOTA algorithm for forecasting and using SageMaker processing, which is a very cool capability of SageMaker. And we will see the other interesting in the coming episodes. Yeah, so next week, I think, so we're going to say goodbye to LSTMs because we've gone extremely deep on those, deeper than we initially expected. But hey, I think these are important topics and we wanted to help you understand those. So next week we're going to talk about another popular topic, which is fraud detection, which matters to pretty much anyone doing online business. And we'll move to different algorithms. We'll use XGBoost and we'll use random cut forest to train fraud detection models and use more SageMaker features and have more fun. Okay? Still. And there are plenty more episodes to come. So, Ségolène, thanks again for being with us. Thanks for all the great insights on Time Series, LSTMs, and everything else. Thank you to the nice moderators who answered plenty of questions, I hope, and thank you, of course, to all the viewers. It's a pleasure to run this for you, and we hope you learned a lot. We'll see you next week, and until then, keep rocking with machine learning. Bye.
Tags
SageMakerTime Series ForecastingLSTNetMachine LearningDemand Forecasting
Julien Simon is the Chief Evangelist at Arcee AI
, specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.