Hi everybody, this is Julien from Arcee. Welcome to episode 19 of my podcast. Don't forget to subscribe to be notified of future videos. I've been a little slower than usual in publishing these episodes because I'm currently writing a book on Amazon SageMaker, and that's taking a lot of my time. We'll talk about the book in another episode. For now, let's talk about AWS news. I want to show you some cool features on Amazon Polly, Personalize, Recognition, and SageMaker. So let's get started.
As usual, let's start with the high-level services. The first one I'd like to discuss today is Amazon Polly, our text-to-speech service. As you probably remember, Polly has two different engines for voice generation. The first one is the standard engine, and the newer one is called the neural engine. We use it for neural text-to-speech (NTTS). The difference is that NTTS generates a waveform using a deep-running network, so it sounds more natural and lifelike. We can apply styles like the newscaster style, which I demonstrated a while ago. The news this week is there's a new voice for the neural engine, a US English voice called Kevin. Let's listen to Kevin.
You just go to Polly and type some text, making sure you select the neural engine and Kevin's voice, and we can generate and listen to it in real time. "Hello, my name is Kevin. I watch silly YouTube videos and eat junk food all day long instead of reading books or playing football with my friends. I hate school and I never listen to what my parents say. I have a really bright future, I think." "I think you do, Kevin. Keep doing that." There you go, super simple to use. Here, I used plain text, but you can also use SSML, a markup language that lets you customize the way the sound is generated, apply the newscaster style, and do all kinds of funny things. Just one more member in the Polly family.
Let's move on to Recognition, and this is a very cool launch on Recognition Video. So far, Recognition has focused on understanding the content in video. With this launch, Recognition moves on to helping you with the structure. Let me show you a couple of examples from a nice blog post, which I will reference in the description. One thing Recognition Video can now do is detect black frames. If you have transitions, starting credits, opening credits, or if you just want to check there isn't a black frame in the middle of your video, this is pretty cool. It can also detect end credits, different shots, and color bars, which are calibration pictures present in professional videos. It just works as usual: you submit your video from an S3 bucket with input parameters, and Recognition will process it, outputting a JSON document with all the information you requested. This is really nice. I haven't had time to test it yet, but I think for video professionals, it's a very nice addition.
Let's continue with high-level services. It's been a while since we talked about Amazon Lex, and this feature is a good opportunity to do that. Amazon Lex is our chatbot service, and the Lex team has added an integration with Amazon Kendra. Kendra is a managed search service, and it makes sense to integrate a search service with a bot service. If you need a refresher on Kendra, I wrote a blog post showing you how to create an index and query that index on a search service, using a collection of Wikipedia documents that I uploaded to S3 and indexed. Once you have that Kendra index, you can just go and ask questions. Let's give it a try using the same example from my blog post.
I built a very simple demo bot in seconds. I created a new bot with a placeholder intent and added a Kendra intent, which uses the Amazon Kendra search intent, one of the built-in intents. You just create the intent, select the built-in value, and point it at the Kendra index you want to query. You fill in some example answers, and you can see they extract information from the Kendra response. I just copied and pasted it from the documentation. I built the chatbot, which took just a few seconds, and now we can test it. We can ask a question like, "Who's Thad Jones?" The query goes directly to my Kendra index, and the answer is that Thad Jones is a jazz trumpeter. Let's try another one: "With whom did Thad Jones play?" We got a good answer: Thad Jones continued his career playing with Count Basie and Mel Lewis. The context is pretty good. When I ask, "With whom did Thad Jones play," I'm not looking for his football buddies or basketball buddies; I'm looking for other musicians, and Kendra gets that context. This is a very nice integration, super simple, and it worked on the first try.
The next one is Personalize, a service I really like. Personalize is a managed personalization and recommendation service. The team added recommendations filters, which is great because one of the main problems with recommendations is you don't want to recommend stuff you've already bought or viewed. It's annoying, especially with ads. With Personalize, you can still create your personalization model as before, uploading your CSV file with user-item interactions. For example, if user 1 bought items 1, 3, and 6, and saw item 47, it wouldn't make sense to recommend items 1, 3, and 6 again. Using the event type, you can create a filter with a simple syntax to exclude items where the event type is "purchased" or "movie viewed." Creating a filter is super easy in the console, and you can apply it to your recommendation campaign. This is one of the top asks from personalized customers, and I'm very happy to see it released. It's very easy to use, so go and try it.
Finally, we're moving to SageMaker. I've talked about SageMaker Ground Truth before, which is the data labeling service for SageMaker. You can distribute datasets to different workforces for annotating data samples, whether they're text, images, or something else. Ground Truth supported text for entity extraction, sentiment analysis, images for object detection, segmentation, and image classification. Now, you can annotate 3D point cloud datasets, which are used to train autonomous driving models. These data points come from cameras and LiDAR sensors, mapping the 3D world around the car. The dataset is complex, but we provide examples on how to process it for annotation and sample notebooks. You can distribute 3D frames to your labeling workforce. I wrote a blog post about this, and I have a couple of videos on my YouTube channel. In the video, you can see me zooming in on a 3D space, applying a 3D box to a car, and using a keyboard shortcut to fit the box to the ground automatically. This is one of the assistive labeling features that help you annotate complex datasets. You can watch the other videos where I show object tracking, following a car across different frames. This is a fascinating tool, and even if you don't work on autonomous driving, it's good to run the demo and see what Ground Truth makes possible.
That's it for this week. I hope you learned a few things. Don't forget to subscribe to my channel. I hope I'll see you on the road at some point when the world goes back to normal. Until then, keep rocking.
Tags
Amazon PollyAmazon RecognitionAmazon LexAmazon PersonalizeAmazon SageMaker Ground Truth