Amazon SageMaker Ground Truth Creating a labeling job part2

December 17, 2019
In this video, I show you how to create a labeling job, using semantic segmentation as an example. https://aws.amazon.com/sagemaker/groundtruth/ https://aws.amazon.com/blogs/aws/amazon-sagemaker-ground-truth-build-highly-accurate-datasets-and-reduce-labeling-costs-by-up-to-70/ Follow me on : * Medium: https://medium.com/@julsimon * Twitter: https://twitter.com/juliensimon

Transcript

In the previous video, I showed you how to create a private labeling workforce for SageMaker Ground Truth. Now, let's create a labeling job. First, we're going to give it a name: JobTable1. The next step is to define the location of the data in Amazon S3. There are two ways to do this. You can have your data stored like this, or you can bring a manifest file. A manifest file is a JSON file listing the data samples, such as images that need to be labeled. However, I can create that manifest file automatically. Let's just enter the path to those images. Here, I'm working with images. We'll see the different types of labeling jobs in a moment. The manifest is being created. This is just a simple JSON file, and it is visible in S3. It's a simple JSON file listing the samples to be labeled. If you have an existing manifest from a previous job, you can reuse that. Where to save the labeled information? Let's say I want to put it here. I need an IAM role to allow SageMaker access to S3, so I can reuse that. Now we get to the important part: selecting a task type. As you can see, we can pick from image, text, or custom tasks. For images, the options are: - Classification: Assigning labels to the full picture. - Bounding box: Drawing rectangles around objects for the model to train on. - Segmentation: Assigning groups of pixels to specific objects in the image. - Label verification: A newer feature that allows you to run quality assurance jobs on labeling jobs that have already completed. Workers can review annotations and decide if they're acceptable. For text, you can do: - Text classification: For sentiment analysis, for example. - Entity recognition: Highlighting specific words or sentences within text, such as product names, company names, dates, etc., for training a natural language processing model. We can also do custom tasks, defining custom workflows for annotations and plugging in Lambda functions. This is easier than it used to be, and we have examples in the documentation. For this demo, we'll stick with images and I want to show you segmentation. Let's click on next. Now, I have to select my workforce. This is where I'll select the team I created in the previous video. There's a great option for automatic data labeling, but I won't use it here because it takes time and I don't have enough data. The way it works is that humans start annotating data, and a machine learning model is trained in parallel on those annotations. When the model can label samples with the same confidence as humans, it takes over and labels at scale, making the process faster and cheaper. However, I only have seven or eight pictures, so I won't use this feature. Let's look at what the workers will see. This is the instruction screen that workers will see in the console. You should provide clear instructions, proper examples, and both well-segmented and poorly segmented images to show exactly how you expect the images to be labeled. For this task, we'll be segmenting guitar players and singers. I'll add labels for that: guitarist and vocalist. It's important to use precise terms and make the instructions as clear as possible, especially if you're working with external workers. I can create the job now. It appears in the console, and I can browse images from my S3 bucket. This takes a few minutes to be fully ready for annotation. We'll pause here, and in the next video, I'll show you how to get the labeling done. See you in a minute.

Tags

SageMaker Ground TruthLabeling Job CreationImage SegmentationWorkforce ManagementManifest File Creation