Deep Learning at the edge with AWS DeepLens Julien Simon AWS
May 09, 2018
AWS DeepLens is a wireless video camera and API that lets you get hands-on experience with Deep Learning and develop your own computer vision applications. In this talk, we'll look under the hood of Deep Lens and discuss its different building blocks: Intel hardware, Intel Inference Engine, Apache MXNet, AWS Greengrass and Amazon SageMaker. Demo included, of course!
As the Artificial Intelligence & Machine Learning Evangelist for EMEA, Julien focuses on helping developers and enterprises bring their ideas to life. He's also actively blogging at https://medium.com/@julsimon. Prior to joining AWS, Julien served for 10 years as CTO/VP Engineering in top-tier web startups where he led large Software and Ops teams in charge of thousands of servers worldwide. In the process, he fought his way through a wide range of technical, business and procurement issues, which helped him gain a deep understanding of physical infrastructure, its limitations and how cloud computing can help. Last but not least, Julien holds all seven AWS certifications.
Transcript
Our next speaker this morning is Julien Simon. Julien is an AI and Machine Learning Evangelist at AWS for the whole of EMEA, Europe, Middle East. Julien also has one of these deep lens devices, which is quite rare in Europe. I'm really excited to see how it works, what's under the cover, and what you can do with it. Let's welcome Julien on stage.
Thank you. Good morning, everyone. My name is Julien. I figured we would have enough geeky t-shirts, so I brought a heavy metal t-shirt. And we have way too many French people in the room today. What's going on? Why don't we do this in Paris next time? No visa issues in Paris, okay.
My talk will focus on running machine learning and deep learning at the edge, which means outside of data centers. I'm going to talk about deep learning at the edge, which means running machine learning and deep learning predictions in any environment, literally anywhere. I've got my camera here. Hopefully, it's going to work. No technical issues today, we'll see. If I have technical issues, we'll figure them out.
Why would we need to run deep learning at the edge? What are the first three or four things we need to worry about to run this successfully? The first thing we need is the ability to experiment in the cloud. Even if we want to deploy at the edge on cameras, sensors, drones, etc., we still need to work with data sets in the cloud and train at scale. If we work with smaller data sets, we can work locally or on some servers in the closet. But if we're talking about terabytes or petabytes of images, video, sound, voice, etc., we need scalable infrastructure, and the cloud will give us that.
One of the challenges of running deep learning at the edge is that these devices tend to be very tiny. They have small CPUs, little memory, and limited storage. We can't assume that whatever model we run in the cloud on powerful GPU servers will run just fine on a Raspberry Pi or an Intel Atom CPU, like the one in this camera. We need to worry about performance. We don't want to wait 20 seconds to predict a single image. The last thing we need to worry about is deploying and updating code and models. When you deploy to servers or cloud environments, you use CI/CD, Docker, and other mechanisms. These won't be available on embedded environments and devices like this, so deploying and updating code and models will be a concern.
Let's look at these four things. Experimenting in the cloud: AWS has been supporting the Apache MXNet project for a while now. It's our preferred deep learning library. We use it to build our own services and contribute to the project. One reason we support it is because it has the broadest selection of languages. If anyone in the room wants to build deep learning models, raise your hand. You're probably using Python, but if you want to use R, C++ for maximum performance, Scala, etc., these are available. We see this as an advantage over other libraries, and we can't expect all our customers to use a single language.
Recently, the MXNet project introduced a new API called Gluon. Initially, the MXNet programming model was symbolic. You defined the network using symbols, compiled them into a graph, optimized it, and then trained and predicted. This gives the best performance, similar to TensorFlow and others. However, the training process can become a black box, making it difficult to inspect, stop, debug, or change. The Gluon API uses imperative programming, so you can define and run your model, use the Python debugger, stop the training process, and have full control over the training loop. It's generally more flexible and easier to work with.
Another thing we like in MXNet is the extensive model zoo. There's a large collection of computer vision models, which are likely to be part of edge applications. They've been pre-trained, and you can use them as is or fine-tune them. This collection, which I believe is the biggest I've seen, includes models like MobileNet, SqueezeNet, and DenseNet that work well on resource-constrained devices.
For training, you could use CPU-based instances like the C5 instances we launched, or GPU instances like the P3 with the latest NVIDIA V100 chips. On these instances, you can build everything yourself using the Deep Learning AMI, which provides most of the open-source and machine learning tools you need. But if you need to train at scale, you might want to use SageMaker, a service we launched at re:Invent. SageMaker lets you focus on the machine learning task and frees you from infrastructure tasks. It doesn't matter if you need to train on 100 instances or deploy on 100 instances; it's just one API call away. All infrastructure is fully managed, and you can focus on improving the model and understanding the data.
Once you have a model, what about running it at the edge? Speed and accuracy are crucial. MXNet is natively written in C++, making it as fast as possible. Even if you use the Python or Scala API, you end up calling highly optimized C++ code, so you don't waste performance. If you use Gluon, you can hybridize the models, compiling them into symbolic form, which optimizes memory and performance, getting you very close to native performance. Many customers experiment with Gluon and then hybridize their models to squeeze out every bit of performance.
You can also use performance-oriented libraries like MKL, the Intel library that uses optimized math primitives and specific instruction sets like Intel AVX to speed up linear algebra and convolution. There's also an open-source library called NNPACK, which works on ARM-based architectures like Raspberry Pis. Another cool technique in MXNet is Mixed Precision Training, where you use 16-bit weights and activations instead of 32 bits. This makes the model twice as small, which matters for downloading to tiny devices over unreliable networks. 16-bit arithmetic is also faster than 32-bit, giving you a performance boost with hardly any loss of accuracy.
Now, what about deployment? Ideally, you would train a model in SageMaker, write a Lambda function to grab data from your device, run it through your deep learning model, and perform predictions. You would then add the Lambda function and the model as resources in Greengrass, an IoT service that lets you deploy code to edge devices. A Greengrass group can be a single device or a collection of devices. You can attach code and models to the group, and Greengrass will handle deployment automatically. This solves the deployment problem, which is often a nightmare in embedded systems.
Greengrass requires devices to be powerful enough. Anything smaller than a Raspberry Pi might be challenging, and devices should have at least a 1 GHz clock speed and 1 GB of RAM. You need to provision the devices in AWS IoT, create a certificate, and a key pair for authentication and encryption. Security is important.
A few months ago, we introduced Greengrass ML, which is still in preview. This allows you to deploy models and Lambda functions separately. Previously, you had to combine the code and model, which was inconvenient because every code change required a full redeployment. Now, you can manage them separately, and the Greengrass framework on the device will run predictions using the model and send results back to the AWS cloud using IoT messages.
Let's take a quick look at the Greengrass console. Here, we see two groups: one with a Raspberry Pi at home and another with my camera. In the Raspberry Pi group, I attached a Lambda function that grabs frames from the Pi camera and runs predictions. I also defined resources, giving the Lambda function access to the camera and memory. I can attach a model, like SqueezeNet, which can be picked from S3 or trained on SageMaker. We provide pre-built binaries for Raspberry Pis, Jetsons, and other devices, but you can use any library, like scikit-learn or TensorFlow, with the appropriate APIs.
You can deploy this as often as you want, and Greengrass will handle updates, even if devices are powered down or have unreliable network connectivity. For a real gadget, we have the dPlants camera, which has a tiny graphical accelerator on an Intel board. It runs Ubuntu and is pre-installed with Greengrass. When you register the camera, you can pick from sample projects, like object detection, face detection, or cats and dogs. You can deploy these projects in minutes, but you can also train your own models and build custom projects.
Let's see the camera in action. It's running an object detection model that can detect people, chairs, bottles, and other objects. The model is deployed via Greengrass, and the camera processes the video feed locally. You can take a closer look later. When you register the camera, you can pick from sample projects, and the model and Lambda function are deployed automatically. The camera uses the Intel Inference Engine, which optimizes the model for Intel chips, fusing layers and applying hardware optimizations for extra performance.
Here's the Lambda function running on the camera. It grabs frames from the camera, resizes them, runs inference, and uses OpenCV to draw bounding boxes. The Intel Inference Engine optimizes the model on the first call of do_inference, fusing layers and applying hardware optimizations. The Lambda function can also publish IoT messages back to the AWS cloud for analytics and reporting.
This is the DeepLens console, where you can create new projects by picking models from S3 or SageMaker and adding your Lambda function. There's no disruption between training and prediction; just train, grab the model, and deploy it. We have blog posts and resources to help you dive deeper into this.
Some resources to explore:
- MXNet and Gluon web pages
- SageMaker free tier and SDK samples on GitHub
- Greengrass free tier
- Pre-order the dPlants device from Amazon.com
- Intel Inference Engine for optimizing inference
- My blog on Medium for more on machine learning and deep learning
Thank you so much. If you have questions, I'm happy to take them. Before taking questions, I'd like to invite the startups who are about to pitch to come up and set up, as we're running a little late. But that doesn't mean we can't take questions. Does anyone have any questions for Julien?
Hi, great presentation. I have a question about pre-processing of images, like channel manipulation or other techniques to improve model performance. Does the pre-processing happen on the device or elsewhere?
The only thing that happens on the device is the conversion of the model from vanilla MXNet form to the Intel-optimized form, and then prediction. The model is a normal model trained in the cloud. All traditional steps to make the model more efficient, like data augmentation, apply during training. Pre-processing steps like resizing, normalization, or white balance can be done in the Lambda function, but keep in mind the device's limitations. Don't go wild on pre-processing, or performance will drop.
Hi. I have a question related to MXNet. You mentioned the importance of optimizing for specific hardware, like Intel. I mostly use NVIDIA hardware for training deep learning algorithms. What is the order of the tensor in MXNet—NWHC or NCWH?
The typical order in MXNet is batch size, number of channels, height, and width (NCHW). This is what most people use. If you want to try something else, you can, but it might slow down performance. For edge devices, we predict image by image, so the batch size is always one. Predicting more than one at a time might be choppy, so predicting one at a time is usually best for high FPS.
Thank you.
Julien Simon is the Chief Evangelist at Arcee AI
, specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.