Building Your Smart Applications with Machine Learning on AWS AWS Webinar
April 10, 2018
Machine Learning (ML) has long been an arcane topic, accessible only to experts. In this webinar, you will learn how to easily add Amazon API-driven ML services to your education software. Image and video analysis, text-to-speech, speech-to-text, translation, natural language processing: all these are just an API call away. Through code-level demos, we'll show you how to quickly start integrating these services into your education offerings, with zero ML expertise required.
Speaker: Julien Simon, Principal Evangelist AI/ML EMEA, Amazon Web Services
Learn more: https://aws.amazon.com/education
Transcript
Hi everybody. I'm very happy to talk to you again. My name is Julien and I'm Principal Evangelist with AWS focusing on artificial intelligence and machine learning. In today's webinar, we're going to talk about our search for machine learning on AWS. You might have attended the previous webinar where we discussed those services and some of the use cases our customers were implementing. In this one, we're really going to dive deeper, run some demos, and look at some code.
So let's get started. As you probably know, this is part of a series of webinars. My colleagues will take you through Poly on April 24th, our text-to-speech service. On May 10th, you'll hear about Amazon SageMaker, our new service for end-to-end machine learning. Of course, I'll say a few words about those. But if you want to dive really deep into Poly and SageMaker, please make sure to register for those webinars at the same URL you used to register for this one.
If you are ready to get started with AI and machine learning, my colleagues would love to hear from you at edtechteam.amazon.com. Please use the subject on this slide to make it easier for us to find you. Once we get your email, we'll put you in touch as soon as we can with the right resources and the right team to get you started. So please get in touch at edtechteam.amazon.com if you have projects, ideas, and if you'd like to talk to us to get started.
When it comes to AWS, our mission is to ensure that every developer, every data scientist, no matter the size of their organization, can use machine learning. We've built a number of services that can be equally useful to people who are just starting their machine learning journey and to those who are already advanced and need to scale. This is why you'll find three layers of services. The high-level layer, we call them the application services. These services are just one API call away. They are based on deep learning and machine learning and can process complex tasks for image analysis, text-to-speech, and so on. When we go through demos, we'll show all of them. You just have to call APIs. You don't need to know the first thing about machine learning to use them.
If you need to work with your own data sets, build your own models, and generally tweak models more and build custom solutions, then the platform services are probably what you're looking for. SageMaker is part of that. We'll talk about SageMaker and see how you can easily train complex models at scale using SageMaker. At the lowest level, we have frameworks and infrastructure for organizations who want to run everything themselves, manage their own EC2 instances in AWS, and control 100% of the process. We'll say a few words about those as well.
Let's start with the application services, and I've got some fun demos to show you. The first service I want to talk about is Recognition. Recognition is an image analysis service that can do object detection, facial analysis, face comparison, celebrity recognition, image moderation, and detecting text in images. All these features are just one API call away, and I'll show you some examples in a few minutes. There's a free tier for these services. The free tier allows people who have just created their AWS accounts to use these services for free for the first 12 months up to a certain level. If you start exceeding those limits, you start paying for the service. But all these services are pay-as-you-go, so it's unlikely you'll have any surprises.
All these services have a free tier, so you can experiment for free up to a certain level. You'll find all the information on aws.amazon.com/free. All the AWS services included in the free tier are listed there, and all these application services are part of that.
Back to Recognition, we also introduced a video analysis service called Recognition Video. You can do object detection, face detection, and extra things like activity detection, figuring out what people are doing in the video, and tracking. We can follow people across the video. Recognition Video works in archive mode and real-time mode. You can process files stored in Amazon S3 or analyze video streams in real time.
Poly is a text-to-speech service and is pretty easy to use. You can select a text string, choose one of the 52 voices in 25 languages, and call an API to generate a sound file in real time. This is fast enough for real-time interaction and is what the Echo devices use. The Alexa voices are generated with Poly.
We also introduced Amazon Translate, which can do real-time translation in one API call. We support 12 language pairs, including English to French, Spanish, German, Portuguese, Arabic, and simplified Chinese. We just announced six more languages coming soon, including traditional Chinese, Japanese, Russian, Czech, and a couple more. More languages are coming, and we hope to have as many as possible.
Amazon Transcribe is another service, now available. It supports English and Spanish and is getting faster. It transcribes speech into text, including punctuation and timestamps, which are important for captioning. It supports high-quality and low-quality audio, such as telephony audio, which is useful for transcribing phone calls for further analytics, sentiment analysis, and understanding call satisfaction. It can recognize multiple speakers, labeling them as Speaker One, Speaker Two, etc.
Another text service we launched is Amazon Comprehend. You can use it in two ways. First, you can run a single document through it with simple API calls to do entity extraction, key phrase extraction, language detection (100 languages), and sentiment analysis. This is useful for understanding social posts, reviews, or user comments.
Another way to use Comprehend is for topic modeling. This involves building topics from your own collection of documents. You specify how many topics you want, and Comprehend runs over the documents to build a list of topics. Each topic is a list of related words with relative weights. Once you have the topics, Comprehend scores every document in the collection against each topic. This is an efficient way to index and find relevant documents.
We also have a chatbot service called Amazon Lex, which lets you define conversational chatbots in text or voice form. Defining a chatbot means defining the conversation, figuring out the interaction with the user, and providing questions to extract relevant information, called slots. The bot's purpose is to converse with the user to obtain all the slots. For example, if you need to book a hotel, you need to know the city, date, room type, number of people, and number of nights. Once all the slots are available, the bot fulfills the operation by invoking a piece of code hosted in the AWS cloud, such as a Lambda function.
This is a quick overview of those services, but I really want to show you some demos now. Let's get started. I'll switch to my console and start with Recognition. I'm going to upload an image to show what this can do.
Here it is. This is being uploaded to AWS, and in real time, I'm doing facial analysis. I can detect one, two, three, four, five, six, 10 faces. I get information like gender, age range, and emotion detection. Even fuzzy faces are detected. This shows how well it works in real time.
Let's try object detection with the same image. It sees people, clothing, swimwear, picnic, park, leisure activities, and sports. The confidence scores are high for the main objects. This is a good example of how well it understands the picture.
Let's try this with code. I'll use some Python code I wrote. Here's a picture of a soccer match. Let's figure out what's in it and how many people are there. I'm running some Python code, passing the picture to Recognition. Six faces have been detected, and we get keywords about the picture. The labels and confidence scores are returned in a JSON document, which I can use to draw bounding boxes around the faces.
The voice you heard was Polly. I built a text string from Recognition's output and passed it to Polly to show how easy it is to combine these services. The code is on my GitHub repo, so feel free to ask, and I'll be happy to share it. It's very little code, just basic Python.
Let's try Poly now. One of Poly's features is the ability to use SSML, a markup language for customizing pronunciation. For example, numbers, dates, and phone numbers can be pronounced differently. Let's listen to this: "Your reservation for two rooms on the fourth floor of the hotel on March 21st, 2012, with early arrival at 12:35 p.m. has been confirmed. Please call 888-555-1212 with any questions."
Let's try a different phone number format. It didn't change, but SSML can do much more, like changing pitch, tone, and speed to make the speech more human-like. We'll cover more on SSML in the Poly webinar.
Now, let's try Transcribe. I recorded a short message: "Hello, my name is Julien and I live in Paris, France. Right now, I'm recording a sound file because I would like to test a new service called Amazon Transcribe. This file should be longer to give me a good sample and hopefully Transcribe will be able to understand my French accent. If not, well, that should make for a pretty funny transcription. This should be long enough by now."
I uploaded the file and ran it. It's not strictly real time but takes about a minute. Here's the transcription: "Hello, my name is Julien and I live in Paris, France. Right now, I'm recording a sound file because I would like to test a new service called transcribed. This file should be longer to give me a good sample and hopefully transcribed will be able to understand my French accent. If not, well, that should make for a pretty funny transcription. This should be long enough by now."
Transcribe didn't recognize its own name, but the rest is quite good. This was my first try, so there's room for improvement.
We just announced a reference for Transcribe with Eiko360. They're transcribing classes in 650 schools in 30 countries, impacting millions of students. Using Transcribe, they will transcribe classes as they happen, share content, and translate it to different languages. You can find this case study on our website.
Let's move on to Translate. Here are the live languages: Japanese, Russian, Italian, Traditional Chinese, Turkish, and Czech. Let's run a couple of Translate examples.
Let's try Arabic. I don't speak Arabic, but if some of you do, you can check the translation. It's real time, and the generated English makes sense. It's a huge time-saver to translate text automatically and then have a human tweak it.
I have a Chinese example too. This is from a Chinese newspaper. The translation looks very good in English.
These services are super easy to use. I'm using the AWS command line, but you can use our SDKs in C++, Java, PHP, Node.js, Golang, Ruby, and more. You can build these into your own applications.
Let's try Comprehend. I picked something from the NFL website. I have no idea who Sam Darnold is, but he's probably a famous quarterback. Let's analyze this text.
We detect entities like people, locations, organizations, times, and dates. We also have key phrases, language detection, and sentiment analysis. This is very easy to do with a few API calls.
Let's put all these things together. I'd like to read signs on billboards, such as this Super Bowl ad. I want to detect text, detect the language, speak it in English, and translate it to different languages, then speak those translations. A few years ago, this was science fiction. Now, it's possible with about 25 lines of Python code.
I'll use Recognition to detect text, Comprehend to detect the language, and Polly to speak it. I'll translate it to Spanish, Portuguese, French, and German and speak those translations. This is very little code, and it works in real time.
These services are easy to understand, do one thing well, and are easy to combine. You can come up with many use cases for automating translation and text-to-speech.
We talked about the application services. Now, let's focus on the platform services. The application services are great for general-purpose use cases, but for some of you, they won't work. For example, if you need to translate languages not yet available in Amazon Translate, like English to Hebrew, or if you need to do domain-specific image recognition, such as cancer detection on medical images, you'll need to train your own models.
SageMaker is designed to help every organization, whether they're machine learning experts or not, to simplify and scale their machine learning processes. There's a free tier for this service. With SageMaker, you can run end-to-end machine learning workflows, from experimentation in Jupyter notebooks to data preparation, training, and deployment.
You can use built-in algorithms for linear regression, clustering, classification, time series, natural language processing, and more. You can also bring your own code if you use MXNet, TensorFlow, Scikit-Learn, or custom C++ libraries. SageMaker frees you from managing training and prediction infrastructure, letting you focus on understanding the data and building the best model.
SageMaker includes notebook instances, pre-built EC2 instances with everything you need to run machine learning and deep learning. They are a convenient way to get started, especially for larger teams. You can select from built-in algorithms and pre-built environments for popular libraries. Training can be done on a single instance or multiple instances with one API call.
We also have hyperparameter optimization in preview, which uses machine learning to figure out the right parameters for your training job. Once you have a working model, you can deploy it with one API call. SageMaker will create an HTTP endpoint that serves predictions without you writing a single line of code. We even have auto-scaling for prediction infrastructure.
Let's look at a quick demo. We'll do sentiment analysis with MXNet and Glue. We'll use a movie review dataset with positive and negative reviews. I'll download the dataset, upload it to S3, and bring my own script. This is vanilla MXNet code that trains a natural language processing model to detect positive and negative reviews.
I'll use the MXNet object from the SageMaker SDK, specify the script, and train on four instances. I'll define some hyperparameters and call fit to start the training job. SageMaker will create the instances, deploy the MXNet environment, inject the script, and run the training job.
This is a quick overview of SageMaker. We have a full webinar on SageMaker, but I couldn't help showing you something to underline this description. I can predict using the SDK, but you could also invoke that HTTP endpoint from your application. You can see this is working; positive reviews are scored as one, and negative reviews are scored as zero.
SageMaker simplifies infrastructure, which often gets in the way of moving fast and iterating quickly when doing machine learning at scale. Developers and data scientists can run on fully managed infrastructure and get the job done faster with minimal infrastructure drama. This is a quick taste of SageMaker. For the lowest level, the frameworks and infrastructure, if you want to build everything yourself, you can start by picking instances. For machine learning and deep learning, we recommend two different instance types. If you have very large datasets with heavy data like images, videos, or speech, the P3 family is probably what you need. This is based on the latest NVIDIA GPU, the V100, which is much more powerful than the previous generation of GPUs in the P2 instances. The P3 family comes in different sizes, from one to eight GPUs in the same instance, providing a lot of computational power for heavy training tasks.
For CPU-based instances, we recommend the C5 family. C stands for compute, and 5 is the fifth generation, based on the Skylake architecture. The C5 family includes the AVX 512 instruction set, which allows for parallel complex math operations, speeding up training and prediction jobs. C5 instances are a good option for smaller to medium datasets and are cost-effective for prediction, as most of the time you will be predicting one sample at a time. However, please run your own tests to find the best option for your data and models.
We highly recommend using the deep learning AMI. This Amazon Machine Image is free to use, and you only pay for the underlying instance. It's pre-installed with all the tools you need to get started, including deep learning libraries, NVIDIA drivers for GPU instances, and more. Many customers have moved from their own AMIs to this one and are very satisfied. The deep learning AMI is frequently updated to ensure you get the latest libraries and is optimized for performance.
Lastly, we have DeepLens, a great educational device for developers who want to practice and learn about real-life deep learning. It's a collaboration with Intel, runs on an Intel board, and can deploy code and deep learning models from the cloud to a camera, running everything locally. The camera doesn't need to be connected to the cloud to perform video processing. Users are building fun models for object detection, people detection, and more. You can pre-order it from Amazon if you'd like to get your hands on a real deep learning device for hackathons or educational purposes.
This is the big picture of AI and machine learning on AWS, from high-level services to the lowest level. If you need to work with your own datasets or build your own models, SageMaker is a great option. For Hadoop and Spark users, EMR is a managed service for Hadoop and Spark, where you can do some machine learning. However, SageMaker is a more scalable and interesting option. At the lowest level, the deep learning AMI will make your life much simpler and can run on CPU or GPU-based instances, allowing you to run any libraries or your own code.
Some resources you might find useful include the top-level page for machine learning on AWS, the AI blog with technical content, code, and customer stories, and the top-level SageMaker page. Register for the SageMaker webinar for more detailed use cases. My blog on Medium has posts on deep learning, SageMaker, and more. Feel free to check it out and give me feedback. Don't forget about the upcoming webinars, which will dive deeper and show cool demos. If you're ready to get started, my US colleagues are waiting for your emails at edtechteam.amazon.com. Use that subject line to ensure they find you quickly and get you started in the right direction.
Thank you very much for listening. It was a pleasure to do this webinar. Thank my US colleagues for organizing everything. We'll pause for a few seconds and start answering your questions. Please stay if you'd like to chat. Thank you.