Enabling Deep Learning in IoT Applications with Apache MXNet AWS Online Tech Talks
March 24, 2018
Many state of the art deep learning models have hefty compute, storage and power consumption requirements which make them impractical or difficult to use on resource-constrained devices. In this tech talk, you'll learn why Apache MXNet, an open Source library for Deep Learning, is IoT-friendly in many ways. In addition, you'll learn how services like AWS Lambda and AWS Greengrass make it easy to deploy MXNet models on edge devices.
Learning Objectives:
- Deep learning basics for IoT
- How to get started with deep learning on AWS
- How to deploy Apache MXNet deep learning models on the edge Subscribe to AWS Online Tech Talks On AWS:
https://www.youtube.com/@AWSOnlineTechTalks?sub_confirmation=1
Follow Amazon Web Services:
Official Website: https://aws.amazon.com/what-is-aws
Twitch: https://twitch.tv/aws
Twitter: https://twitter.com/awsdevelopers
Facebook: https://facebook.com/amazonwebservices
Instagram: https://instagram.com/amazonwebservices
☁️ AWS Online Tech Talks cover a wide range of topics and expertise levels through technical deep dives, demos, customer examples, and live Q&A with AWS experts. Builders can choose from bite-sized 15-minute sessions, insightful fireside chats, immersive virtual workshops, interactive office hours, or watch on-demand tech talks at your own pace. Join us to fuel your learning journey with AWS.
#AWS
Transcript
Hey, hi everybody. This is Julien here and we have an exciting topic for you today. We're going to talk about deep learning in IoT applications using a deep learning library called Apache MXNet.
So here's the agenda for today. First of all, we're going to discuss what it means to run deep learning applications at the edge, which might seem like a strange concept at first. Then I'll take you through a quick introduction to Apache MXNet, this open source library that AWS supports and uses for their own products. Then we're going to look at a number of options for prediction because edge devices could be powerful enough to run prediction, but what if they're not? Can we do it in the cloud? So we'll look at some options here. And last but not least, we'll look at an actual edge device running deep learning at the edge, and it's called AWS DeepLens. It's quite fun, as you will see. Of course, I will share some links and resources to help you get started. And as a summary, these are the services we're going to mention today. So we're going to talk about MXNet, the Deep Learning AMI, SageMaker, AWS IoT, AWS Greengrass and its latest edition, Greengrass Machine Learning, and DeepLens, just like I said. Okay, here we go.
So what about deep learning at the edge? Why do we even consider this problem? Well, as you probably know, there's an increasing amount of data being generated out there, and a lot of it doesn't even get to the cloud. A lot of it isn't even captured. Think about medical equipment. Think about industrial machines operating in areas where network connectivity and cloud connectivity is quite difficult. And of course, we have extreme environments. So off-planet is as extreme as it gets. But even on our planet Earth, you could be operating in deserts, you could be operating in oceans, you could be operating in areas where it's not even feasible to have network connectivity and to use cloud-based services for storage and analytics. And it's a bit of a shame because this data is important. We'd like to capture it, we'd like to store it, we'd like to maybe build predictive models. But in a number of situations, this is just not possible. And you could say, well, okay, that's a temporary problem and technology is going to catch up. But again, three more issues tend to be long-lasting issues. The first one is obviously the law of physics. We still haven't beaten the speed of light as much as we would like to. And in remote areas where the nearest AWS region is quite far away, it's just not realistic to perform real-time prediction. It would just be too slow. Latency would just be too high. Then the second problem is even if you're close enough to a cloud region, maybe network connectivity is so expensive that it doesn't make sense. The numbers don't add up. And if you use satellite connections, well, they're expensive and it's just not worth it to use them for prediction. And of course, you could have other problems like privacy that prevent you from sending data outside of a certain area, which would mean that you would have to compute and process it where it's actually captured. So as you can see, these problems are really difficult and probably they will be long-lasting issues.
So deep learning at the edge, of course, it could mean capturing data at the edge and sending it to the cloud if we can. And that's a good start because this data can get centralized, and we can use analytics, we can train machine learning models, etc. And that's fine, run everything in the cloud. But it would really be nice if we could close the loop and bring that data back and those models back to the edge, have the ability to run predictions and to use that data locally as soon as we capture it and just make our devices and applications smarter. And that's what we're going to talk about today. Of course, easier said than done, there are a number of problems. The first obvious one is that edge devices are usually heavily resource-constrained. Smaller devices at the edge tend to have a small CPU, low memory, not a lot of storage, if any, and power consumption is always an issue, especially if these devices are running on batteries. So you just can't count on your typical powerful EC2 instance. Working at the edge is just more difficult. As mentioned before, network connectivity is the second major problem. It might even not be available at all. And if it is, it's likely that it's costly, bandwidth is limited, latency could be quite high. Imagine a ship, a scientific ship in the middle of the ocean; you're not going to get super low latency to your cloud region. So in those cases, on-device prediction might actually be the only option. But referring to the point above, the device needs to be powerful enough to run those predictions. So we may be stuck between a rock and a hard place, actually. We have to find that balance. And the third issue is obviously deployment. Deploying in the cloud is very easy. We have all the tools to automate that. Deploying on a fleet of devices that could be literally all across the planet in remote areas is something that's not easily achieved, especially if they have pretty poor network connectivity. So these are the problems we need to address.
So here's our wish list. The first item is we're not going to train models at the edge. In most cases, these devices are just very tiny devices, and it's just not realistic to do that. So I would say in a large number of cases, we'll rely on cloud-based services for training and for deployment to deploy the new models to those devices. If we can, if we have good enough network connectivity, it would be great if we could use cloud-based prediction. That's an option we need to have whenever available. We also should be able to run device-based prediction if a cloud-based prediction is impossible or too costly or too slow. And we should be able to do so with a good level of performance. Of course, we should be able to support different technical environments. Edge devices come in many shapes and sizes, CPUs, and programming languages. So we have to account for that and be flexible.
So let's talk about Apache MXNet for a second and see how this library is going to help us do that. So MXNet is an open source project that started in 2015. Initially, it was a university-driven project, researchers, etc. And then it was joined by a number of companies. And last year, that project got accepted into the Apache incubator, which is great news because it means it's actually driven by the Apache process. And so no single entity, no single organization can actually influence the roadmap and the features of MXNet. The reason why AWS likes MXNet is because it comes with multiple APIs. You can support multiple languages, as we will see in a few minutes, such as Python, C++, Scala, etc. So it's good for our customers. We have millions of customers, and not everybody uses Python, and not everybody uses C++. So flexibility, once again, is good. MXNet is a very efficient library. It works equally well in very large environments, cloud-based platforms. It will scale linearly with hundreds of GPUs, but it works also very well on tiny devices. So it has conservative memory usage and it is quite fast as well. So those two properties are very interesting if you are running on tiny devices. So in a nutshell, these are the reasons why AWS supports MXNet, contributes to the development, and uses it internally for our own projects.
So when it comes to IoT specifically, MXNet has a number of interesting properties. The first one is flexible experimentation in the cloud. The second is scalable training. The third one is good prediction performance at the edge. And the fourth one is the ability to deploy models in the cloud or at the edge and to support those two prediction scenarios. So let's look in detail at those four items.
First, flexible experimentation in the cloud. As I mentioned, we have eight different model APIs for all these languages. So if you're more of a data scientist, chances are you're probably using Python, R, or Perl. You can even import MXNet in MATLAB. And if you're more of a developer profile, then again, Scala and C++ are good options. And even faster options than the previous languages. So again, whatever language you use, you're likely to find something that fits here. The second thing is last year, the MXNet project introduced a new API called Gluon. And this Gluon API is even more flexible than the previous, the initial MXNet API. It's based on what we call imperative programming, which basically lets you define networks and train networks just by writing imperative code. So let's say Python code. And what this means is it gives you maximum flexibility when it comes to inspecting and debugging and even modifying models during training. You can actually stop the training process at any given time and break into a debugger, for example, to understand why maybe the training process isn't going the way you thought it would go, or you can just alter the model, inject new values, etc. And that wasn't really possible with the initial MXNet API, which we call the symbolic API. So Gluon is really a tool for developers. Developers will be extremely familiar with the way Gluon works. And the last important feature is, of course, saving time by relying on the Model Zoo. The Model Zoo is a nice collection of pre-trained computer vision models, and computer vision applications are likely to be a part of a lot of IoT projects. And so you can just go and grab one API call, a pre-trained model, and you can either use that model directly or you could fine-tune it, adapt it to your own dataset. So it's a huge time-saver for developers and data scientists. In addition, the Model Zoo includes models like DenseNet and SqueezeNet, which have been specifically designed for resource-constrained devices like mobile phones or Raspberry Pis, those kinds of devices. So when it comes to IoT, you could start from those, have good performance, and once again, save a lot of time because they're just ready to go. Just pick them up from the Model Zoo and run them.
The second important feature of MXNet is it allows scalable training in the cloud. So of course, it's going to run on Amazon EC2 instances. And as you probably know, in the last few months, we introduced two new generations of compute-oriented instances, the C5 family, which supports the latest Intel Skylake chip, which is, as always, just generally faster than the previous generation, but it also supports a new instruction set called AVX 512, which is a vectorized instruction set specifically designed for machine learning applications. And the second family is the P3 family, which supports the latest NVIDIA V100 chip, which is simply the fastest and the most powerful GPU today. And again, that GPU introduces some dedicated hardware features for deep learning called Tensor Cores. So that's a good foundation on which to train your models, and MXNet fully supports and is fully optimized for both C5 and P3. On top of this, of course, you would need to start an instance. And probably the best choice, the easiest choice would be to use the Deep Learning AMI. The Deep Learning AMI is a collection of tools pre-installed, pre-built, pre-configured for you, such as all the popular deep learning libraries, TensorFlow, Keras, MXNet, Caffe, PyTorch, the Microsoft Toolkit, etc. It also comes with the NVIDIA drivers, the Anaconda distribution, just more tools. So just fire up that instance with the Deep Learning AMI, and within minutes you can get to work and just save time and be efficient. And as you probably know, at re:Invent 2017, we introduced a new service for machine learning called Amazon SageMaker that lets you build end-to-end workflows and to design and train and deploy your machine learning and deep learning models on managed infrastructures. And of course, we provide pre-built environments for MXNet and Gluon with additional libraries like TensorFlow, etc. So these two ways of training, either on your EC2 instances with the Deep Learning AMI or on SageMaker managing infrastructure, will work just fine with MXNet. And this will scale very, very high. So even if you have very large datasets, there is probably nothing to be afraid of.
The third important item is prediction performance at the edge. So MXNet itself is written in C++. So even though if you use the Python API or the Scala API or the other APIs, at the end of the day, you end up calling highly optimized C++ routines. And so performance is going to be quite good, actually. If you use the Gluon API, the Gluon API supports a feature called hybridization that lets MXNet transform those imperative networks into symbolic networks. And this will let MXNet perform additional optimization on memory usage and speed. So again, you get the flexibility of the Gluon API during experimentation and training. And then when you want to actually train and build a model for production, you can extract maximum performance and get really close to the performance level of native C++ code using hybridization. You can build MXNet with two libraries that boost performance even further. Of course, the version that we provide already supports these libraries. And these two libraries are the Intel MKL library and MKL-DNN. And both have the same purpose. They provide very fast, very optimized implementations of math primitives involved in machine learning and deep learning. And those implementations also rely on hardware-specific instructions for Intel chips like the AVX data set I mentioned or the ARM Neon instruction set for ARM-based platforms like the Raspberry Pi. So just to speed up training and particularly inference even more. Another technique that you can use on MXNet is called mixed precision training, and that's fully supported. So when you design a deep learning model, all the values, all the parameters of that model, the weights and the activation values are by default stored and managed as float32 values. As it turns out, researchers found that you can switch to float16 values and hardly lose any accuracy. So the obvious gain here is that the model size itself, the model that you will deploy to your edge devices and load on both devices, basically shrinks by 50%, and 16-bit arithmetics are also quite faster than 32-bit arithmetics. So on top of the model size reduction, you also get a nice performance boost. For more information, please look at this NVIDIA blog post that tells you how to use mixed precision training for MXNet. Quite a powerful technique for IoT devices.
The last advantage of MXNet when it comes to IoT projects is the deployment options that you can get for cloud-based prediction or edge-based prediction. So again, we have four more options, and we're going to look at these four in more detail. The first one is if you want to start with cloud-based prediction, you could invoke a Lambda function from your device using AWS IoT and perform prediction there. Another scenario for cloud-based prediction would be to train and deploy a model in SageMaker and invoke its HTTP endpoint. Moving to device-based prediction, of course, you could bring your own code, your own MXNet code, your own model, and just build your custom app exactly the way you want it. And the last possibility would be still for device-based prediction to use a service called AWS Greengrass that lets you deploy code and models to your edge devices automatically. So let's look at these four scenarios in detail.
First, invoking a Lambda function with AWS IoT. So you will train a model in SageMaker or, of course, you could bring your own. You could put that model in S3, our storage service, or you could embed it in the Lambda function. But remember, Lambda functions have a maximum size. So that might limit the maximum size of the model that you could be able to use. So if it's a tiny model, it could fit inside the Lambda package that actually gets deployed. If it's too big, then you could put it in S3, and the Lambda function would load it from there. And in both cases, you would use the Lambda function to perform prediction. And to actually invoke this Lambda function, probably the easiest way would be to use our IoT service, AWS IoT. So send an MQTT message from your device to AWS IoT and using AWS IoT rules trigger the invoking of your Lambda function, perform a prediction, and get a result. So this scenario would best work when devices cannot support HTTP or are not powerful enough for local inference. Also, cost, if cost needs to be as low as possible, then AWS IoT and Lambda are quite a good combination because Lambda, as you know, only generates billing when functions are actually cold. So if you don't have any traffic, you're not being charged for any compute costs. The requirements obviously would be a good network connection between your devices and the AWS region. And all devices should be provisioned in AWS IoT with a certificate and a key pair for secure authentication and communication. And we have a blog post presenting this. So please read this blog post for an actual tutorial on how to do this.
The second solution would be to train a model in SageMaker again, or bring your own into SageMaker. SageMaker has the ability to import models. You could deploy to a prediction endpoint using the SageMaker SDK. And of course, this is a vanilla HTTP endpoint that could be invoked from your edge devices. So this works best if devices are not powerful enough for local inference, if you still need to rely on cloud-based prediction. Also, if models cannot be easily deployed to devices, this is probably a good option. This would be interesting as well if you need cloud-based data for prediction. Let's say your devices would send some sensor data and you want to mix it up with other data hosted in the cloud, then obviously it would make sense to run that prediction in the cloud. And maybe you want to, for various reasons, centralize all prediction activity using the cloud and using a SageMaker managed model would be a good way to do this. The requirements are obviously, once again, you need a reliable network connection and you need devices powerful enough to support HTTP, actually HTTPS, which is a little more demanding.
The third option would be this time to move to device-based prediction by bringing your own code and model. So you could train a model in SageMaker. Once again, of course, you could bring your own. You could bring your own application code. So it could be your own MXNet script written in any of the languages that I mentioned before. And you would need to deploy both the model and the application code on your devices. So it could be at manufacturing time, just load everything on your device, or maybe you built your own provisioning and update mechanism, and that's perfectly okay. This solution would be fine if you don't want to or if you cannot rely on any cloud services. Maybe you have zero connectivity. Maybe these devices will be deployed in an area where network connectivity is just not available. Maybe you're deploying them underground, who knows? And it just wouldn't be cost-efficient to have any connectivity in the basement or just maybe in the subway, and who knows? So that's a scenario. Just provision the devices from manufacturing time and use them just like that. The requirements obviously would be that devices should be powerful enough for local inference. You shouldn't have to update the models because once again, this might be a little complicated. Maybe you're never going to update the models. You could think of some devices you could use forever with the same model. But to me, the biggest requirement obviously is that you need to do a lot of stuff yourself. And well, for some companies that's fine, for some others, this might just be too much. Which brings me to the last option, where you would deploy your code and model with AWS Greengrass, our newer IoT service. Once again, you could train a model in SageMaker or bring your own. You would write a Lambda function performing prediction using that model, and you would write it in the cloud. It's, again, a vanilla Lambda function. You would add both the model and the Lambda function as resources in your Greengrass group. A Greengrass group is a collection of devices, obviously at least one device, but there could be one core device and then a local network of additional devices using the core device for computation and communication with the cloud. So you could have a scenario where you have a core device that's maybe slightly more powerful than the other local devices. And all of these would be part of the same group. And once you've done that, well, and once you've registered your group and your devices to AWS IoT, Greengrass is going to handle the deployment and the updates both for the code and for the model on demand, pretty much just call a deployment API for Greengrass. Greengrass will make sure that all devices get updated with the proper code and the proper model. So that takes a lot of pain away. This is best when you want to have the exact same programming model in the cloud and at the edge, because the Lambda function that you wrote in the cloud is the exact same Lambda function that will be running, that will be deployed, and that will run on your device. So there is no need to change the code. Greengrass will deploy the exact same code. If you need frequent updates to code and models, then again, Greengrass takes everything in charge. It doesn't matter if network connectivity is infrequent or unreliable. Greengrass will just retry and make sure that deployment completes even if networks are not connected yet. The next time they connect, Greengrass will perform the deployment. And maybe like I said, you have scenarios where you have a more powerful device in the group that should be able to perform prediction for maybe a collection of smaller devices connected to it. So why not? Requirements. Well, you have to be careful there because the minimal requirements for Greengrass are 1 GHz clock speed and 128 MB of memory. Although, in my opinion, this might be a little bit low for MXNet, especially if you're working with deep-running models. So for reference, a Raspberry Pi does have a 1 GHz clock speed, but it has 1 GB of memory. So your mileage may vary, but you really have to experiment with a variety of models. Some models could be just too large or too complex to run on the smaller end of the Greengrass specs. So again, just run your checks. And of course, you will need to provision all those devices in AWS IoT certificates, keys, etc. But this scenario is actually very interesting and quite flexible. And this is the scenario that AWS DeepLens uses. And we'll get to that in a second.
So at re:Invent 2017, we launched, I guess you could call that an extension of Greengrass. It's called Greengrass Machine Learning. And Greengrass Machine Learning lets you explicitly add machine learning resources to your Greengrass groups. So here's an example snapshot from the Greengrass console. And here I can see my group. That's actually one. It's a Raspberry Pi I have running at home. And so it's just one single device in that group. We see some successful deployments here. And if we go to the Lambda section, we see that this is the Lambda function that was published in Lambda in the cloud. And then that got deployed through the group. So explicitly, I attached this version to my group. And as you can see on the left side here, it says using v1. So if I just wanted to publish a newer version of that Lambda function, because I just updated the code, I could obviously edit this and say, okay, now please push version two of my Lambda function to that group. And I would trigger a new deployment, and my devices will be updated as soon as the network connectivity is available. And if I keep going down in the menus here, I see this resources section. And specifically, this is the addition that Greengrass ML brings. Now we have the ability to specifically add a machine learning, a deep learning model to a Greengrass group. Okay. And you can bring something that's hosted in S3, or you can actually select a model hosted in SageMaker directly. So that's very useful. Again, it reinforces the end-to-end machine learning workflow scenario that we're trying to build with SageMaker. Train in the cloud, go to Greengrass, select a model you just trained in Greengrass, and push it to your edge devices. Okay, so very, very nice addition. That service is still in preview, but please feel free to join the preview and send us some feedback when you get access.
So let's take a look at AWS DeepLens. So we have a concrete example of an edge device. So DeepLens was launched at re:Invent 2017. Once again, was a busy year for machine learning. And this is the first deep learning camera for developers. So it's really a developer tool. It's an educational tool. We wanted to build something that would help developers become more familiar with deep learning and computer vision applications. And it's a, yeah, let's face it, it's really a nice gadget to have on your desk, and you can build all kinds of fun applications with it. So it supports MXNet models that you can deploy from SageMaker once again. It's integrated with other AWS services such as Greengrass and AWS IoT, etc. So you save a lot of work by doing this. So actually getting a DeepLens to work is something you do literally in 10 minutes, very easy to configure. And within minutes, you can start deploying your sample projects. We have projects for object detection, face detection, cat versus dogs, and the mandatory hot dog, not hot dog project, and a few more. And of course, you can create your own project by writing your own Lambda function, because as you understood, a Lambda function is running locally on this device using the deep learning model that Greengrass put there to perform prediction on the video streams. So you're probably curious to see an example. We're going to see one in a second. So here's an example, one of those sample projects in the DeepLens console. So we see clearly that a project is really a combination of a Lambda function that was written in the cloud and deployed through Greengrass, just like I previously explained, and a deep learning model. In this case, a face detection model that has been trained in SageMaker and that we grab from SageMaker when we actually configure this project. So, well, this is what it looks like. That's me on the right-hand side. And well, those are two of my kids. And as you can see, we were trying to do silly things with the object detection model in the DeepLens. So the font is really tiny here, and you might not see the different objects being detected. So actually, persons are detected, as you can see, or there's a bounding box, which pretty much goes around me and my son. There's the chair detected as well on the left. The sofa is detected on the back here. And in funny enough, my laptop is also detected. It says TV monitor, but I guess that's close enough. This model can only detect 20 objects. So a TV monitor and laptop screen. And that's, that's close enough. So that's an example of fun things you could do. And the other, the kids loved it if you need to know.
Okay, so how do you get started with all of this now? The first step might be some training. So we announced and released some free three digital courses that let you just, you know, within minutes, start learning some new stuff on AWS in general. There are quite a few topics out there, but obviously, there are some courses on artificial intelligence and there are some courses on IoT. So if you put these two together, you're going to be on track to build everything that I discussed today. So please just go to aws.training and learn more stuff. And once again, it's completely free. So there's no reason not to do it. If your organization would like some help with machine learning, we launched a new initiative called the machine learning lab. And what the machine learning lab does is putting companies and organizations in touch with Amazon experts. People have been doing this for a long time. It could be Echo engineers or Alexa engineers or Amazon Go engineers. So they're going to help you frame your problem early on. They're going to help you figure out if your problem indeed is a machine learning problem and what kind of model would be nice and what kind of data you might need. So the purpose here is really to make sure you get a good start on your machine learning project. So again, please go to that ML solutions lab URL to learn more and get in touch if that could be of interest to your organization. And of course, we have a ton of web pages and white papers for you to read. The top-level machine learning page, the SageMaker page, Greengrass and Greengrass ML, DeepLens, and the machine learning AMIs, the deep learning AMI, etc. So of course, you'll find technical information, you'll find customer use cases. If you're curious about what our customers are building with all of this and boy, are they building. So lots of stuff to read. The MXNet web page at the Apache website, and the Gluon web page. And I want to point this one out to you, because this actually has an excellent and I mean excellent deep learning book, which is again, completely free, written by one of our researchers, Zach Lipton. I have to mention him because he did a fantastic work. And what this does is it really teaches you what machine learning and deep learning is, looking at the algos in general, and then showing you how to implement them using the Gluon API. So very, very nice resource both for machine learning in general and for the Gluon API. And well, I'm also actively blogging on Medium. So you might enjoy some of my articles on deep learning and MXNet and SageMaker. If you read and enjoy that, please leave me a message. And again, I would appreciate your feedback.
I'm done. I want to thank you very much for listening to this webinar. I hope it was informative and useful. You have quite a bit of reading now to understand what those services do and how you could use them in your company. It was a pleasure talking to you today, and I guess we'll answer your questions now. Thank you very much.
Tags
Deep Learning at the EdgeApache MXNetAWS IoTEdge ComputingAWS DeepLens
Julien Simon is the Chief Evangelist at Arcee AI
, specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.