Amazon SageMaker Ground Truth Using labeled images part 4

December 17, 2019
In this video, I show you how to view labeling statistics, and discuss how to use labeling information for training. https://aws.amazon.com/sagemaker/groundtruth/ https://aws.amazon.com/blogs/machine-learning/chaining-amazon-sagemaker-ground-truth-jobs-to-label-progressively/ https://aws.amazon.com/blogs/machine-learning/tracking-the-throughput-of-your-private-labeling-team-through-amazon-sagemaker-ground-truth/ https://github.com/awslabs/amazon-sagemaker-examples/tree/master/ground_truth_labeling_jobs Follow me on : * Medium: https://medium.com/@julsimon * Twitter: https://twitter.com/juliensimon

Transcript

All right, in the previous video, we annotated some images for semantic segmentation using the new automatic segmentation tool. Hopefully, you saw this was a much faster way of doing things than drawing polygons. If you want to know what your workers did, worker activity is actually logged to CloudWatch. So this is the CloudWatch log group you should be looking for. If I open this, I'm going to see a log file for that specific job. I'll see additional information on which worker accepted which task, and how much time they actually spent. We have a blog post showing you how to use CloudWatch metrics to make this human-readable, and I will add it to the video description. The job is over, and now if I go to the S3 console, let me go back to the top of that bucket. We see our initial images here and we have this output directory. If we keep clicking down all the way to the manifest, we see this output manifest file. This is what it looks like. Not human-readable, but it's not meant to be read by humans. Here, we see mask information about the annotations we performed. This is the good stuff, and we see confidence scores. This is what the machine learning model is going to use to train the model. If we go back up, there's a whole bunch of additional information that's really not meant for us but that the model will use. All that job information is saved in S3. The next step would be to check out those sample notebooks hosted on GitHub. Again, I will put the URL in the description, and these show you how to use the manifest files to train different models. For example, here we show you how to build image classifiers, etc. This is a really good next step. Once you've played with Ground Truth a bit and have a proper image data set, you might want to label it and start training models. Some of these notebooks actually have data sets for you to work with as well. I would highly recommend that you train models and see that it's pretty easy to just dump the augmented manifest into this notebook and train a model really quickly. This goes to show the full A to Z process from unlabeled data to S3 all the way to training a machine learning model. I would recommend that you look at those, but of course, you have all the information in S3, so if you know what to do with that and want to train your own code on that, that works too. The last thing I want to say is what if you need to run more annotations? Let's say, for example, now you would want to annotate drummers. You would not have to start again. You could just go to the labeling jobs and select the job that you want to start from, and you can actually chain that job, meaning you can start another labeling job based on this one. This lets you review or further annotate an existing job. So that's another cool feature. All right, well, I think that's pretty much what I wanted to tell you about Ground Truth, and I'm looking forward to the next features. Hopefully, this was useful, and that's it for today. Bye-bye.

Tags

SemanticSegmentationAWSGroundTruthCloudWatchLogsS3StorageMachineLearningModelTraining