nz.js WLG 08 03 17 Stefan Judis Julien Simon

Transcript

We'll jump straight into the next talk. Now, we've got Julien talking, coming from AWS, flying all the way over from France. It was a 27-hour plane ride, and he's here straight away. So, first of all, thank you so much for inviting me. I wasn't planning on speaking at a meetup two hours after showing up in New Zealand for the first time. Yes, I flew directly from Paris yesterday or the day before. I might not make a lot of sense, but it's okay because it's artificial intelligence. At least some part of this will be intelligent, I guess. So thanks again. I'm really happy to be here. The meetup at the end of the world, literally. We'll reboot for tomorrow. I'm not going to talk about JavaScript at all, right? Is that okay? Yeah? Oh, yeah. So that's a relief. Okay. Well, I will tomorrow, though. We discussed what would be a good topic and what would be a safe topic for me today with 10% of my capacities. So this is going to be about Amazon AI. Before I dive into that, who am I? I'm the French guy who spends his life on planes and trains. I'm based in the Paris office sometimes, but most of the time I'm traveling. Eighteen months ago, I was the evangelist for France. Now I'm the French evangelist. The difference is that if you're only traveling in France, everything is the same. For me now, it's a whole lot more fun to do what I do. So I travel a lot and go to meetups, conferences, and of course, AWS events. Before I forget, I will be in Wellington until Friday night. So you have my email address. If there's anything I can help you with, feel free to shoot me an email, and we can meet at the conference if you guys are there or go and have coffee somewhere. God knows I'll need it. So, you know, I'm around. I have some free time available for anything. It's all free, so use it. If you follow AWS a little bit, we had our annual conference in Las Vegas a couple of months ago, re:Invent. We announced a bunch of new services on AI, and this is what I'm going to talk about. So let's just dive in. The first one is a service called Amazon Polly. You guys speak English, so you will relate to that. When I do this in French, Polly means nothing to them. So why is it called Polly? Well, it's a play on words. Poly. So basically, you give text to this service, and it will speak. We're going to do all kinds of silly tests afterward. It can speak in 24 languages with 47 different voices, including accents. There's an Australian accent and a New Zealand accent, which I'm not familiar with yet. I guess they couldn't replicate it. The Scottish accent either. So don't take it personally. It's a very straightforward service, quite fast. I will use the AWS region based in Europe for my demo, so it couldn't beat the speed of light yet. There's a little latency, but in the Sydney region, it's fast enough. What's nice about Polly is that it sounds natural. It's sounding. Right, so that's, you all knew that, right? It's the beginning of Alice in Wonderland, which is how I feel right now. The English and American voices are quite elaborate from our experience with the Amazon Echo devices, which I guess you guys have heard about. They're pretty good. We'll try some other voices. The English ones are really better than the rest. In my opinion, Germany is really good because the Echo device is also available in Germany. But you need a lot of feedback to improve the voices. We're going to roll that out to new countries, and it's going to get better. Another important thing if you want a useful service is how accurately it can pronounce. If you read that sentence, as a human, you would never say, "Today in Las Vegas, NV, it's 54." That would sound weird. "Today in Las Vegas, Nevada, it's 54 degrees Fahrenheit." OK. So one of the good things about Polly is its built-in understanding of these important notions. NV means Nevada, and so on. This isn't something you have to handle. Polly can do it because it's based on deep learning technology and has been trained to recognize that this is not NV but Nevada because Las Vegas is in Nevada. Here's another one. "We live for the music live from the Madison Square Garden." Every language has words that spell the same but sound different. In French as well. I guess German doesn't have that. All the words are different in German, right? Well, I'm sure you can find an example. I'm sure Paul is going to fail at it. And of course, it needs to avoid that robotic voice. Sometimes you get that in airports or train stations where messages sound like crap. "Peter Piper picked a pack of pickled peppers." Intonation and varying the volume and tone are important to make the sentence understandable to everyone. Most of Europe and APAC are covered, though Chinese is notably missing. I'm sure we'll add that. We have Australian English. For most languages, we have one male and one female voice. If you want to give additional information to Polly, you can use SSML. It's a markup language you can use to spell words or pronounce them differently. If I wanted to use a French dialogue and pronounce them the English way, I could do that. My name is Kuklinski, spelled K-U-K-L-I-N-S-K-I. You can do that kind of thing. You can also customize the voice up to certain points. You can add your own lexicons, etc. That's it for Polly. It's really simple. If you want to try it, go to the AWS Console. Let's find that Australian English. Do you want to try Russell or Nicole? I'll try Russell. Hi there. My name is Russell. I will read any text you type here. Yeah? No? I couldn't tell. We have the French Canadian, which is good enough, I guess. All right. Hans. Hans is good. Sounds German to me. Let's try all those voices. As you can see, it's very fast. You can really use this service for interactive stuff. It will build an MP3 file with the sentence, and you can play it. It's a really simple service. Of course, it's driven by the CLI, so you can play with that. You can use all the SDKs. That's Polly, right? Simple. Let's move on to the second service. My daughter's name is Kaja. Yes, thank you. My daughter. Okay, so the second service I want to talk about is called Lex. Lex is basically a chatbot service. Chatbots are pretty difficult if you want to get them right. There are a ton of problems to solve, both for voice-activated and text-activated chatbots. You need speech recognition, natural language understanding, and to build and train it, not to mention the languages, etc. Then there's infrastructure stuff: how do you scale it, secure it, and make it highly available? You need to solve all that if you want to build something that works. We're bringing Lex to everyone to solve all these issues automatically. It's based on Amazon Alexa, the same technology used by the Echo devices. Whatever the Echo devices use, this is what Lex is based on. The Echo devices are available in the US and are quite popular, selling really fast. We got a lot of feedback and experience on how to run voice services, and this is exactly the same technology. Lex builds on top of that, is easy to use, and the voices are really good. It scales automatically, is a managed service, and is pretty cheap. The use cases for chatbots are any human-to-machine interactions that could be improved by speaking naturally. It could be asking for information, or an interesting use case is for people who cannot use their hands, such as in a lab environment or for surgeons. They can interact with their voice, which is a good example. There are millions of different use cases. The structure of a bot is quite simple. First, you need to initiate the conversation, which we call the intent. The intent triggers the bot and starts the conversation. Then you need to fill slots, which are things required to fulfill the task. For example, if you want to book a dentist appointment, you need the day, hour, and type of exam. The bot prompts for this information. Fulfillment is getting things done. Once we have all the information needed, the bot can fulfill the order. Let's look at an example. I built a small chatbot and integrated it on Facebook. When you build a chatbot with Lex, you get an API that you can integrate into your own app or on Facebook, with more channels coming, like Slack. I'm going to initiate the conversation. It's a flower shop, so I guess to order flowers. Roses, why not? What color do I want my roses? How about black? OK, not available. Demo in Amsterdam, they said, of course it's not available. But I tried it with tulips. They told me black tulips kind of exist. Anyway, let's try red roses. What day should the roses be delivered? I'm going to say, "Hey, how about next Friday?" I don't care about formats. Next Friday makes sense to Lex because it's based on deep learning. Next Friday is... What time? I could say 10 in the morning. My phone number... Okay, maybe it will work. The time is correct, the date is correct. Yes, that's good. The order is confirmed. That's the fulfillment phase. I got my confirmation SMS here. Perfect. That's the basic structure. "I'd like to order flowers" triggers the conversation. The chatbot needs to fill the slots: flowers, color, and potentially a phone number to send the SMS. Once the bot has all that, it can fulfill the order. Make sense? You could have a chatbot for each specific operation or a more general-purpose chatbot with multiple intents. It's up to you. So you saw the tip of the iceberg. Let me show you a little bit of what's underneath. It's a managed service, so no infrastructure to manage. Just define your app. Here I have a couple of utterances because not everyone will say, "I would like to order flowers, please." You need to figure out the multiple ways to trigger the conversation. Then you define the slots. This is a predefined type, Amazon date. The fact that it's a date allows the bot to use the deep learning context to understand "next Friday." You have a lot of predefined types, events, and context to use. You can define your own types, like flower. Here you have the sentence or question that needs to be answered to fill the slot. For the fulfillment phase, I have code running, a Lambda function. If you're familiar with AWS, you know it's a Lambda function. Lambda is where you deploy small functions that get an event, do something with it, and complete the operation. You don't worry about provisioning servers. Just small functions triggered by events, in this case, bot events. Here I have a function. I could validate the type of flowers. If this was a real-life bot, you would query a back-end to check inventory. I'm trying to keep it simple, so I'm selecting from a static list. But it's a piece of code, and you can invoke any API to get data from any back end and complete the order. I'm sending the SMS using a service, which is just one line of code. You can combine the deep learning magic to avoid messing with "next Friday" and customize your operations, validation, and fulfillment with code. Lambda supports Python, Java, Node.js, and C#. That's Lex. My favorite is called Rekognition. Rekognition is an image detection and face detection service. You can play with it for hours. It does four things: scene detection, face detection, face comparison, and facial recognition. For scene detection, you give an image to the service, and it tells you what the image is about. It detects faces in a picture, compares faces, and searches for faces in a collection. You could have a company with a thousand faces, and it would find a person inside the collection. For detection, give it a picture, and it tells you what the picture is about. We'll run some examples. For face detection, it finds faces and gives you information like gender, age range, and emotions. For face comparison, you give two pictures, and it tells you if the faces match. For facial recognition, you index faces and search for a specific face in the collection. It stores a mathematical model of the face, not the picture itself. Let's play with it. You can try it in the console or use the API. I wrote a couple of scripts. All images must be stored in S3, so I'm giving a bucket name here. The image is probably already there, so I don't need to copy it. Oktoberfest.jpg, that's pretty obvious. In this picture, Rekognition sees people, alcohol, beverages, drinks, crowd, female, girls. It's a party involving alcohol and girls. There are many faces in there. My script draws boxes around the faces. Rekognition found 15 faces, the maximum it's allowed to find in a picture. It's a pretty accurate description. Alcohol, a crowd, and girls. It could be soda or apple juice, but the shape of the glass might be a factor. It found a lot of faces. Let's look at face one. It's 99.9% sure it's a face, a girl, happy, and gives an age range. The age range was a newer feature. In the early beta, we gave a proper age, but it ended up being an age range. So is she between 26 and 43? Clearly, she's not 43. Be careful with that. That works pretty well. Let's try another one. The name gives it away. City, downtown, metropolis, urban, and there's one face in it. Let's look at that. She doesn't look happy and is between 19 and 36, which is very diplomatic. What is this picture? Usually, when I show it, people ask, "Where's the face?" Rekognition picks up the face quicker than you and me. Why not the second face? It's not a human face. The eyes are huge, the proportions are wrong, and the hair color is a little weird. It's an abstract picture of Tokyo. Let's try a last one. It said people, art, painting. Check it out. It's really famous in France. Guiding the people. God knows we need that. It detects faces because, even though it's a painting and some are really dark, the proportions are okay. The painter did a good job. It's not so much a factor of training with pictures or anime. If it looks like a face, then it's a face. Let's do a quick comparison example. I'll take a reference picture, an easy one. There is a match at 84%. The first picture was taken in the US last year after a very long flight. The age range is weird; it tells me I'm way older. Never get your official picture taken just out of the airplane. It's a bad idea. That's not so bad. It is me. Let's try a more difficult one. The lighting is really bad, the angle is really bad, everything is bad about this picture. But it's the same me at 82%. For some use cases, 82% is good enough. That's Rekognition in a nutshell. If you pass pictures of your dog or cat, it will say dog or cat. But if you show your dog in a group of 10 dogs, it won't recognize which one is yours. These services have been around for a while. For example, facial recognition has been used in the streets and subways for years. Some stuff in France uses it, but it's not based on AWS. This technology has been around for a while, but now it's practical and economical. When I was in undergrad or grad school, we had classes on artificial intelligence. It's been around forever but hardly broke out of the lab until a few years ago because people didn't have the datasets or computing capabilities. Now, with GPUs and cloud services, it's all off-the-shelf stuff. We announced at re:Invent that we support a framework called MXNet, which is popular. We're officially supporting and contributing to it. We'll probably use MXNet for more services. If you want to use something else, there's a popular library called Keras, and you can build and load pre-trained models from Facebook, Microsoft, Google, etc. It's all out there. For some applications, you don't even need a GPU. A powerful CPU is good enough. You can even run image recognition on a Raspberry Pi with a pre-trained model without the cloud. The hard part is designing the neural network. State-of-the-art networks have 100 or 200 layers and are incredibly complex. Training can take weeks and is insanely costly. You need the dataset. For these services, you don't need to do that. You can just take the service and use it. Lex is updated regularly with the Echo devices. We have live feedback on sentences and how they work in homes. That's why English and German improve quickly. For Rekognition, you can train your own model if you have 10 million tagged images. It's not self-service yet, but we have customers who do that. How do our models compare to Google's? It's difficult to answer. We haven't published the network structure or data sets. Amazon is pretty secretive, but a lot of data comes from Amazon.com. You can try it. When you type "T-shirts, science fiction" on Amazon, it figures out products using more information than the product description. Image recognition is used to recommend products on Amazon. The plans to help you include moving support for MXNet. We hired top MXNet contributors from Carnegie Mellon University, and they're actively maintaining MXNet. We expect models in the near future, like vehicle recognition, human recognition, and product recommendation. We call this the model zoo, with off-the-shelf, pre-trained models you can use. How do your models compare to Google's? It's difficult to answer. We haven't published the network structure or data sets. Amazon is pretty secretive, but a lot of data comes from Amazon.com. You can try it. When you type "T-shirts, science fiction" on Amazon, it figures out products using more information than the product description. Image recognition is used to recommend products on Amazon. The plans to help you include moving support for MXNet. We hired top MXNet contributors from Carnegie Mellon University, and they're actively maintaining MXNet. We expect models in the near future, like vehicle recognition, human recognition, and product recommendation. We call this the model zoo, with off-the-shelf, pre-trained models you can use. For data privacy, everything stays in your own AWS account. If you build a biometrics application for your company, you would store the data in your account. It could be the picture or the mathematical representation. It's a vector model of the face, not the face itself. Everything you know about AWS data ownership applies here. We take extreme measures to keep data safe. It's harder for us to control what our customers do. If someone builds an intrusive application that stores pictures without consent, you would complain to that person or company. You could also complain to us, and if there's massive abuse, we would act. We monitor that to ensure our services are not used to annoy or do illegal things. We make it easy for you to build stuff, not to misuse it. The Facebook integration is through Facebook apps, which is how typical Facebook bots work. You can integrate it on your own website and customize the look and feel. It's pretty easy to set up, and I managed to do it even though I'm not a Facebook developer. Thank you so much.

nz.js WLG 08 03 17 Stefan Judis Julien Simon

Transcript

Tags

About the Author