Scaling Machine Learning from 0 to millions of users October 2019

January 02, 2020
Talk @ FWDays Highload, Kyiv, October 2019. Deep Learning AMI, Deep Learning Containers, AWS Container Services, Amazon SageMaker ⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos ⭐️⭐️⭐️

Transcript

But this morning, don't worry, we're not going to talk about machine learning specifically, we're not going to talk about algorithms and neural networks and all that crazy stuff. My purpose today is to discuss how you can scale machine learning workloads. So if you're a DevOps engineer or if you're a machine learning engineer in charge of deploying machine learning to production, what are your options? What are the different steps you can take? Because scaling machine learning is really not different from scaling anything else, right? I mean, people make a big deal out of machine learning, but it's just like software. So in my experience, sometimes you have linear scaling and it's fine, and you can keep doing what you're doing, just add a few more resources. Every now and then, there's a big jump in scalability, and you need to do it differently. That's what I'm going to discuss today, how to start and how to keep scaling using different techniques. So initially, I'm guessing a lot of people start at day one with machine learning, and it's just one user, it's you, trying to get things done, trying to move a first model to production. Usually, it will start like this: you or someone in your team has trained a model based on data, and it works on your local machine. You probably used open source to do it, and you tested the model. It's working, it's okay, you're happy with it, and of course now you would like to deploy to production. You want to put live traffic on this to see how it behaves, to run some A-B testing, etc. So you're excited, you want to go quickly, you want to see results real quick. So you just take the quickest path, which is to take the model, embed it into the business app that needs that model, and be on your way. So a monolithic app with a model inside of it, deployed to a virtual machine somewhere, and fine, okay? This is easy, we all know how to do this, it works, you can start putting live traffic on this, life is great, okay? So who's there today? Come on, don't be shy, we've all been there. Okay, so a few people are there, and that's alright. At least you're in production, and it's the only thing that matters. So if we look at how we're doing things here, infrastructure effort is pretty much zero, right? How much effort is running a VM? ML setup, pip install, whatever you need. CI, CD, come on, you don't need it. Building models, okay, that's what you actually do or someone in your team does. Training, just run your training script, deploying, predicting, just run your prediction script. And scaling, optimization, cost optimization, and security, later. Yeah? Sounds familiar? Yes, I can hear you. Okay, so fine. You keep doing that, and of course, after a few instances and a few models, problems start to arise. Life is not that good anymore because you keep doing that manual work of starting instances, deploying your app, debugging, etc. So it's time-consuming, it's not what we like doing. And of course, the monolithic architecture is just painful because it means every time you update the model, every time you retrain the model, you have to deploy the app again. You can't share models, you can't scale them properly. Same old, same old, right? So a couple of options, of course, would be to stop doing manual work. I'll talk in a minute about some tools that we provide to help you focus on the model itself and not managing virtual machines and different things. And of course, the monolithic architecture can be at least dockerized to solve deployment problems and hopefully broken down into different pieces. You can create a prediction service, so separate the business logic from the machine learning model using model servers or maybe building a specific API. So basic techniques we've all used for years and they also apply. So if you run on AWS and you want to make your life a little easier, we have those deep learning AMIs and deep learning containers that we maintain for popular frameworks, and you can just run them directly. Unless you really love building AMIs and containers, these are a good option. They're free to use, okay? You just pay for the instance running them, but you don't pay for the EMR or the container, and they come in different flavors and are quite easy to use. So I can show you maybe real quick how to do this. Okay, and that's a good option if you need, for example, GPU instances for training and maybe prediction. Honestly, it's just me, but don't go and buy expensive GPUs that you will not use fully and that could be out of date pretty quickly. Just run this on the Deep Learning AMI. So let me show you how to do this. So this is how it goes. In the interest of time, I've prepared some of that stuff. But in a nutshell, you would just run an EC2 instance, passing the AMI ID for the deep learning AMI. Here we're running a P3 instance, it's a GPU instance. I'm grabbing a spot instance because I want to enjoy the nice spot savings, and I can tag my instance and, of course, I need to set a key pair, a security group, and an IAM role. Just like that. So this is all it takes, starting an EC2 instance with the deep learning AMI. And then, of course, I want to connect to this. So I need my machine learning glasses. Alright. Yep. You'll get there, don't worry. Trust me. Okay, so this is the exact stuff that I run. And then... And all this stuff is on GitLab, so I'll show you the link at the end. Okay, and so of course I do see my EC2 instance here. Here it is. Okay, that one, the last one. Okay, it's running. And I can open an SSH. Hopefully, my command is still here. I don't have to type it. Okay, so I can SSH to that instance and set up port redirections so that if I start Jupyter on this EC2 instance, then I can just go and connect to it and pass the token here. Okay, here we go. And from now on, you know what to do, right? It's a Jupyter notebook, and I could clone my repos and do anything that I want. Okay, and I can shut down this instance once I'm done. And you can pick the instance size that you like. I picked a GPU instance because that's typically something that's quite expensive if you want to buy it yourself. We have all kinds of instances, tiny ones, monster ones, so just pick the ones that you like best. Okay? And that's it. So if you're a DevOps person and you need to set up Jupyter environments for your team and they have fancy hardware requirements, you can get this done in five minutes. It's honestly one EC2 command to start the instance. And of course, you could use CloudFormation or Terraform if you wanted to do that. And then just SSH and that's it. Okay, and shut everything down once you're done. Nothing fancy. Okay, of course, I'm not going to show you Docker containers because it's exactly what you'd expect. So we have a list of containers for training and prediction on TensorFlow and MXNet with different versions. So you can docker pull and docker run and do all that good stuff. So it works exactly the same way as you would work with Docker, except we maintain those containers, you don't have to do it. The versions in there are optimized for best performance on EC2 instances. So it's just something you don't have to do, right? And if you want to grab the base container and add more stuff to it, okay, fine, do that. But save yourself the trouble of building all that stuff. You have more important work to do, I think. So that's one easy way to break out of the laptop, go to the cloud, and enjoy the game. Elastic machine learning infrastructure. Okay, so at some point, of course, you know scaling alert. Something happens in the team in the company. More customers, more team members, more models, maybe more money from investors. Yes. Okay, you have more work to do now. The thing is, now you know you have paying customers, and you need to scale, and you need to have high availability and security. Okay, so all those things where you said, "Ah, later," now they're a thing. Okay, and they're a big thing. So I hope I don't have to convince you in 2019 that scaling up is a losing proposition. A bigger server will work for a while. It's going to cost you more money. Are you sure you're actually using that infrastructure correctly? Scaling up might be an emergency way of solving short-term problems, but in the long term, scaling out is what you need to do. And of course, as you scale out, you keep increasing the number of instances, and it would be silly to do manual work here. So only automation can save you, and again, this is kind of a DevOps conference, so I hope I don't have to convince you that infrastructure as code, CI/CD, and all that good stuff is the only way to go. For machine learning, it's the same as everything else. It's not different. Okay, so what are your options? The first option, of course, would be to use virtual machines, right? Let's keep building EC2 clusters and EC2 endpoints and blah, blah, blah. Sure, it's possible, but I want to know why. If someone in the room is doing this at large scale, I want to know why you're not doing it another way. And I'm not being sarcastic here; I just want to understand. Because to me, unless you are automation gods, and of course this is Ukraine, so you are automation gods, I know. But unless you truly are great at automation, you will have a lot of operational problems and unnecessary spend. So if you want to do it, sure. But please at least use CloudFormation, Terraform, automate everything. Take everything down when you're done working with that infrastructure. You don't want to have clusters, GPU clusters sitting around doing nothing. If you're setting up distributed training on your clusters, it's not going to be fun. Try it once and then send me a tweet saying, "Yeah, you're right, it was not fun." Prediction is the same; please automate everything and use auto-scaling, load balancers, and all that good stuff again from EC2. But if you're doing this today, you know it's a bit of work. Even automating and running using that stuff is a bit of work. And whatever you do, please use spot instances. If you use EC2 today and the word spot doesn't ring a bell, you're probably spending 60 to 80% too much on EC2. So when you go back to the office, look for EC2 spot and thank me. "Hey, I saved 80% on my EC2 bill," right? So if you do this, infrastructure effort is important because you're dealing with EC2 instances, subnets, VPCs, SSH keys, and all that stuff. ML setup is not that painful if you use the deep learning AMI. CI/CD will work the same if you know how to deploy to EC2; you can deploy models. Building and training is up to you, and deploying, scaling, optimizing, and securing is up to you using all the EC2 features. Okay, so you have everything you need to do the work, but it's a little bit unnecessary if you ask me. But if you love IAM, VPC, and KMS, please have fun with that. Option two, Docker clusters, of course. If you're already using Docker today, it makes a lot of sense. If you have a Docker CI/CD setup, if you have Docker tooling set up, if your development flow is based on Docker, then it makes sense to deploy training containers and prediction containers to Docker as well. When it comes to AWS, we have two services called ECS and EKS. ECS is our own cluster manager, and EKS is based on Kubernetes. I don't want to argue on whether you should use ECS or EKS; it's like Pepsi, Coke, McDonald's, Burger King, ECS, EKS. I use both; I like both. It just depends. So you can pretty much do the same thing. They're flexible. You can mix instance types. You can have CPU instances, GPU instances, and define constraints on where to place containers, etc. And we provide AMIs for both. Use those; they will save you more time. Again, use the one that you like best, the one that helps you work fast. A more important question is, should we have one cluster or many clusters? For development, I would say dev and test, it's important to be able to create on-demand clusters. Each developer should be able to get their own cluster and break it and restart it, etc. So automation is important. For production, I find that a lot of customers run a large single cluster because it's easier to keep an eye on, easier to scale, easier to monitor, and just gives you more flexibility. If you have a 50 or 60 node cluster, you have plenty of scheduling opportunities. Still, ECS and EKS do need some ops work. They're instance-based and not a fully managed proposition. So you need to set things up, and if you're already doing Docker today, you probably know how to do that—setting up services on ECS and pods and deployments on EKS, service discovery, etc. So not hands-free, not hands-off. And sometimes people tell me, "Yeah, I don't really care because I'm doing data science or machine learning, and someone else is doing that work." Well, okay, again, a lot of those "someone elses" are probably in the room today. And you know it's extra work, and you're getting paid for that. What if you spent your precious time and your boss's precious money on something else? Just a thought. So in the interest of time, I could show you ECS and EKS. I flipped a coin last night and am going to show you EKS if it's okay. But again, you will get the scripts for EKS as well. Okay, so let's take a look at that. So I created an EKS cluster. Let me show you the... I can close this one now. I can shut this one. EKS. Okay, here we go. So again, I'm using the AWS command line, but you can use APIs, Terraform, CloudFormation, and anything else. Okay, so I'm just creating an EKS cluster here with the first group of workers. Here I'm creating a group of workers called CPU workers with three nodes, C5 instances, and then I added a second group with GPU workers, P2 XL, etc. EKS CTL is the command-line tool we use to manage EKS clusters. Okay, so you run that stuff, you wait for a few minutes, and you have a cluster. So if you go to the AWS console, you can see that stuff. But I guess we don't really want to see the console. We want to use tools that we know. Okay, so we can see, alright, show me. Okay, show me some groups, show me some nodes. What's happening there? Okay, so the cluster is up with all those instances, and the only thing it took was just a couple of commands or a simple CloudFormation script. And then, okay, I want to run some stuff here. So let's say first, I want to run the Nvidia SMI, which just lists the GPUs present on the cluster. Well, I would just apply this pod with the NVIDIA CUDA image and run the NVIDIA SMI tool. Nothing fancy. And then, of course, I could see, so I've done this before to save some time. Yes, okay, so this container has run, and this is the output. So this one has run obviously on a GPU instance, and it did find that K80 GPU. In the same way, I could start training something. So I could start a training pod using one of the deep learning containers. This is actually one of ours. And I'm cloning Keras inside of there and running a simple example to train a convolutional network on the MNIST dataset. I say, "Hey, give me a GPU," so this will make sure that this container runs on a GPU instance. Okay, because remember, I've got both in the cluster. And of course, I could look at those logs again. Okay, run that stuff this morning. Okay, so I see the training log, and it's all fine. So this is EKS in a nutshell: create the cluster. All the control plane is managed, and if we look at instances, my workers here. Okay, so I see my CPU instances and my GPU instances. These are visible, and the control plane is managed by the service itself. So you don't have to manage all that stuff in Kubernetes. That's a little bit tricky to manage. Let's submit it. Okay, so you can do this stuff. It's not complicated. You can run either ECS or EKS. Setting things up is quite easy, and then you use the tools that you know and the containers that you know to get things going. And of course, you could run Jupyter notebooks in there as well if you wanted. Managing Docker clusters. So if we look at the scorecard, infrastructure effort is not as bad because we use command-line tools, Docker tools to provision everything, and the scheduling and control plane are taken care of automatically. So it's not as difficult as setting up Kubernetes on EC2. ML setup is not very complicated if you use the deep learning containers. Deploying containers is okay. CI/CD is fine if you know how to deploy containers; well, there you go, no change. Building models, training models, deploying models is your responsibility, but you can use the Docker tools, which are pretty good and save some pain here. Scaling, cost optimization, and security are very much your responsibility, and you can use either the Docker tools or the EC2 tools to do that. So a little bit better, and when it comes to scaling, that's definitely a more interesting proposition than just managing EC2. But there's another one. The last option I want to talk about is if machine learning is really, really central to your business, if you have lots of models, if you train, if you start training hundreds of models, maybe thousands of models every day, and it sounds like a crazy number, like, no, come on, no one really trains thousands of models every day. And, well, you'll be surprised because you want to try different algos, different combinations of parameters, maybe do hyperparameter optimization looking for the top-performing model, and very quickly, before you know it, you're going to be training hundreds and hundreds of models every day. So keeping track of all that stuff with Docker clusters needs a bit of work. Where is the training log? What were the parameters for this specific training job? Which is the top-performing job I trained today, etc. As you start scaling your machine learning process, you realize you need more tooling, more tooling, more tooling to be efficient. So you can build it yourself or try the fully managed option, and this is a service called Amazon SageMaker. SageMaker is basically a modular service that lets you go from experimentation, so Jupyter Notebooks all the way to production, deploying models with a single service and a single set of APIs. When I say modular, I mean you don't have to go from A to Z. You can pick whatever you need in there. Let's say if your problem is training at scale, and then you have to deploy on-premise because that's what your customer wants, that's alright. You can train in SageMaker, grab the model in Amazon S3, and deploy it elsewhere. Or maybe even test it on your laptop. If you want to do the reverse, if you have an existing model and your problem is more deploying at scale to the cloud, that's fine. You can put your model in S3 and ask SageMaker to deploy it. Enter anywhere, exit anywhere. You don't have to do all of it. And it's quite a popular service. It's being used by all kinds of companies, big ones, small ones, enterprise, startups, all kinds of verticals. The only one I'm not too sure about is Tinder, which nobody uses, right? And of course, I don't really see the machine learning use case in there. So I'm skeptical. Seriously, what are your options now? If you want to train on SageMaker, you can train in three ways. You can use built-in algorithms, which is a really good option if you don't have a lot of machine learning skills or if you don't want to write machine learning code. You know you need a clustering algo, linear regression at scale, image classification, that kind of stuff. So you can just grab one of those algos off the shelf, set some parameters, and train. We have 17 algos, no machine learning needed, great option to get started. Now, if you have existing machine learning code for one of those frameworks, TensorFlow, PyTorch, Scikit-learn, etc., then you can take that code and with absolute minimal changes, which I don't have time to explain today but is really just a few lines of code, you can run them on those built-in frameworks on SageMaker. So take your TensorFlow code, your Keras code, whatever, throw it at SageMaker, train at scale, deploy at scale. And the last option is what if you have something else, let's say you're using Python or C++ or another environment. Well, you can build your own training and prediction container because all this stuff is also based on containers and push that to AWS and train and deploy on SageMaker. Okay, so you can run pretty much any workload here. Everything happens through a specific SDK in Python, which we call the SageMaker SDK, and it's a high-level SDK. If you're familiar with the AWS APIs, the service-level APIs, this is much, much higher level. So you work with algos, models, and training jobs, and it's very little code. I'll show you a quick example in a few minutes. But this is just a very nice and simple SDK. You can learn it in just a few hours to get your training and deployment jobs going. There's also a version for Spark if you want to integrate SageMaker with Spark. And of course, we also have service-level APIs for SageMaker, but these are lower level and probably the ones you want to use for scripting, automation, etc. So let's look at a quick example. How would you train and deploy with TensorFlow? The 30-second story is you put your data in S3. You can now also put it in EFS or FSx for Lustre. That's a recent feature. If you need super high-performance training, you can use EFS or Lustre. But I would say for most people, S3 is just fine and easier to set up. So put your dataset in S3, import the SageMaker SDK, and in this case, we would use the TensorFlow object to set up a TensorFlow training job. The first parameter is your TensorFlow script. In this case, it's a Keras script. Keras is part of TensorFlow anyway. You say, "Hey, please train on the C5 instance," and you pass some parameters to your script that will fit, passing the location of the data in S3, and that will train. So it will fire up that C5 instance, pull the TensorFlow container, train the model, save the model to S3, and then shut down that C5 instance automatically. So it's completely on-demand, fully managed, and you never have to worry about starting and stopping stuff. It's done automatically by the SDK. If you want to deploy to an HTTPS endpoint, you say, "Hey, please deploy to T3 XL." And this will automatically create that instance, pull the container, load your model, and you can start serving predictions on that HTTPS endpoint. So you can use the predict API in the SDK, but of course, you can use any language to HTTP post to the endpoint. That's it. So when I say you can learn this stuff in two hours, I'm really not lying. Now imagine you want to scale things up. This is a tiny model on a small instance. So now you want to move to mycrazyCNN.py, and you need 64 GPUs to train it, and you need 16 load-balanced instances to serve predictions. That's it. I can do this all day. The only change is just give me more instances, just give me bigger instances. That's it. This will create automatically the training cluster with those P3 instances and create the multi-AZ load-balanced endpoint backed by 16 instances in this case, etc. So that's a great way if you want to, even your data scientists could do this directly. They don't really need you anymore, which is great because now you can focus on more important stuff than managing training clusters. So that's what this SDK is for. Even if you don't know anything about infrastructure, no VPCs, no subnets, no nothing, just call this and train and deploy. So I can show you actual code really quickly. I'm not going to run the notebook; I don't have time for it, but just to prove my point. So here I'm using one of those managed notebooks, which is part of SageMaker. I'm using this fashion MNIST dataset, which you've probably seen. I'm training an image classifier. So download the dataset, put it in S3. And that's pretty much the code you just saw, right? Pass that script to the TensorFlow object in the SageMaker SDK. I can even use Spot instances to save even more money, called fit to train. And I see my training process here. Thanks to Spot, I saved 64.5%, which is always nice. And then I can deploy in one line of code and then predict. And I might even try to run this. Yes. Okay, so take 10 images from the dataset, post them to the endpoint, and compare the labels to the predicted labels. So anyone can do this, you know, anyone can do this. Even data scientists can do this. I love them. Just making sure. So there you go, and you could be using a ton of GPUs and a large endpoint, and it wouldn't be different. So how many lines of code did we write to do this? Imagine how much work you would need to do this on the Docker cluster and imagine how much work you would need to do this on EC2. So it's worth trying. So in this case, infrastructure effort is zero. ML setup is pretty much zero because all those algos and frameworks are built-in. CI/CD is probably where you will need a bit of work because you would use the SDK to automate things. You can use AWS Step Functions as well, which supports SageMaker very nicely. But it's probably the area where you would need a bit of work. You can use 17 built-in algos. Training at any scale is two lines of code. Deploying at any scale is one line of code. And you can use Spot and auto-scaling, and security is integrated. So if you need KMS encryption for the training instances, just say okay, encrypt, and that's it. So it's nice. So my personal opinion is that EC2 is fine to get started. Small scale, experimenting, fine. But don't scale on EC2 unless you are automation gods and you actually enjoy spending the time doing it. ECS and EKS make a lot of sense if you're already using Docker. So why not? But if you're not a Docker shop today, if you're not deploying to Docker, I would think twice because Docker is a general-purpose platform, not a machine learning platform. So if you need lots of high-level machine learning features like hyperparameter tuning and all that stuff, you will end up building it yourself, and I don't think it's a great idea. And SageMaker again is easy to learn, zero infrastructure work, and it has a lot of high-level machine learning features that I don't have time to cover today. But obviously, more are coming at our event. So keep an eye on that. And if machine learning is central to what you do, you will find that you save a ton of time doing this. So my conclusion is don't worry too much. I mean, you guys are reasonable people. Whatever you do today is probably fine. Don't over-engineer; scale linearly until you have to make that jump and then just make that jump. Pay attention to total cost of ownership. Don't over-provision, don't spend your money in a silly fashion. And remember, when you do machine learning, models and data matter, not infrastructure. We don't care about infrastructure. You have to scale, smash, and rebuild, which is what the cloud is about. Take big bold steps, just smash it on Monday and run it again on Tuesday, and it's all good. And focus on understanding data and building models. And of course, you can mix and match if it makes sense. So like I said, you can train on SageMaker and deploy on-premise or deploy on Docker or whatever. Find the combination that works best and write your own story and just make sure you optimize your workloads, your budget, and your time. So how do you get started? So lots of stuff here. The first thing I want to say is we actually have another event, an AWS event in Kyiv very soon, October 14th. So it's a physical event, some of my colleagues will be there, and it's a full week of sessions. If you have AWS questions, just go there and ask them. You can register at this URL; it's completely free. We also have on October 17th an online conference called Innovate, which is dedicated to ML and AI. So 20 sessions, again completely free, and you can see the agenda online, and it's a lot of good stuff. And of course, all the URLs to get you started, including links to those notebooks that I showed you. So thank you very much. We're going to take a look at questions. I think you're going to show the questions on screen. If you want to stay in touch, I'm happy to connect on LinkedIn, and my messages are open on Twitter as well. So if you have questions next week, next month, just ping me, and I'll try to answer everyone. Okay, thanks again. Thank you very much for having me today. Thank you. So are we doing questions? Oh, okay. Okay, wow. Okay, so I need my Q&A glasses now. All right. Okay, what ML framework is the best for creating a proof of concept fast? That's a very good question. So I'll give you my personal preference. My personal preference is Keras. I think Keras is a high-level API on top of TensorFlow and MXNet, but it's very easy to work with. It's beginner-friendly, the documentation is great, you'll find lots of examples, and the Keras blog is very good. So Keras is very interesting. If you want another option, then MXNet has a toolkit called Gluon, which is a high-level API, very flexible, and equally good. Okay, so Keras, Gluon, for prototyping are excellent. So there's a question, can SageMaker do request batching for efficient inference? So yes, when you can actually send, so two answers actually. So first, you can do batch processing with SageMaker. You don't have to deploy to an endpoint. You can actually say, "Hey, here's data in S3 to predict. Just predict the whole thing in batch mode," and that's fine. We call it batch transform. So if batch processing is what you need, it's supported. If you want to batch samples and predict multiple samples in one go, absolutely. On the top of my head, I don't remember the maximum request size for prediction on SageMaker, but it's megabytes. So you can absolutely pass 100 samples at a time if you want to batch them. What is the distributed training implementation behind SageMaker? So it depends. So built-in algos are implemented with Apache MXNet. So we use the distributed framework that's part of MXNet. And of course, if you use TensorFlow or if you use the built-in frameworks, then we use whatever is available there. For TensorFlow, there's the built-in mode called parameter server, which is fine, doesn't scale to the moon, but it's fine, and we support it. In AWS, you would set up CloudWatch alerts on cluster usage, container usage, etc., and trigger auto-scaling on the clusters. This requires some management, finding the right thresholds and timings. On SageMaker, you don't need to manage this. My favorite question, what are the advantages of AWS ML services compared to competitors? I don't use the competitors, so you know more than I do. Try them and send me tweets. If they do some things better, we'll fix it. If we're better, let me know. Negative feedback is more useful, but positive feedback is appreciated too. All right, I don't want to run for too long. The other questions are probably a little bit too long to answer. So again, if you want to know more, please check the log next week. We'll have staff there all week to dive deeper into your questions and projects. If you want the big picture and to learn everything about AI and ML services on AWS, the Innovate conference is where you need to go. Tens of thousands of people have registered already, and the agenda looks interesting. Okay, we're completely out of time. So thank you again for inviting me. It's been a pleasure. I hope you have a great day. And again, get in touch later if you have questions. Okay, thank you so much.

Tags

MachineLearningScalingDevOpsAWServicesSageMakerDockerClusters