A hands-on look at the Amazon Rekognition API

Amazon Rekognition is a Deep Learning based image analysis service. Don’t worry though, you won’t have to wade through Machine Learning / Deep Learning mumbo jumbo to work with Recognition. Quite the contrary, as Rekognition provides a very easy-to-use API. 
 
 It allows developers to:

  • detect thousands of objects and scenes;
  • analyze faces;
  • compare two faces to measure similarity;
  • build face collections and match faces against these collections.

As usual, this service can be used with the AWS CLI (as in ‘aws rekognition’ ), or with one of our language SDKs. I’ll show you some CLI examples first and then we’ll use the popular Python SDK, aka boto3.

First things first: how do we send images for processing? Two options: send the image as a byte blob or put it in S3. I suspect the most of use will use the second option, so that’s what I’ll use. Time to play!

Step 3 screenshot from a Hands on Look at the Amazon Rekognition Api

$ aws rekognition detect-faces --image "S3Object={Bucket="jsimon-public", Name="julien1.jpg"}"
 
{ "FaceDetails": [ { "BoundingBox": { "Width": 0.3883333206176758, "Top": 0.12222222238779068, "Left": 0.33666667342185974, "Height": 0.2588889002799988 }, "Landmarks": [ { "Y": 0.23426248133182526, "X": 0.46131378412246704, "Type": "eyeLeft" }, { "Y": 0.22791674733161926, "X": 0.5936729311943054, "Type": "eyeRight" }, { "Y": 0.27828338742256165, "X": 0.5404868721961975, "Type": "nose" }, { "Y": 0.3229646682739258, "X": 0.48395034670829773, "Type": "mouthLeft" }, { "Y": 0.31654009222984314, "X": 0.5957114696502686, "Type": "mouthRight" } ], "Pose": { "Yaw": 4.216298580169678, "Roll": -4.777482509613037, "Pitch": -2.406636953353882 }, "Quality": { "Sharpness": 70.0, "Brightness": 65.17163848876953 }, "Confidence": 99.99468231201172 } ], "OrientationCorrection": "ROTATE_0" }

JSON, the cornerstone of any nutritious service. So, what do we have here? A face has been found with 99.99+% confidence. It’s delimited by the BoundingBox coordinates (top left corner, face width, face height): these are fractional values with respect to the total height and width of the image. Eyes, nose and mouth have been located too (that’s reassuring).

Now, let’s see what Rekognition can tell us about this second picture.

$ aws rekognition detect-labels --image '{"S3Object":{"Bucket":"jsimon-public","Name":"julien2.jpg"}}'

Step 6 screenshot from a Hands on Look at the Amazon Rekognition Api

{ "Labels": [ { "Confidence": 99.29261779785156, "Name": "Human" }, { "Confidence": 99.2958984375, "Name": "People" }, { "Confidence": 99.2958984375, "Name": "Person" }, { "Confidence": 99.2667007446289, "Name": "Book" }, { "Confidence": 99.2667007446289, "Name": "Text" }, { "Confidence": 71.22590637207031, "Name": "Bookcase" }, { "Confidence": 71.22590637207031, "Name": "Furniture" }, { "Confidence": 71.22590637207031, "Name": "Shelf" }, { "Confidence": 52.00172805786133, "Name": "Portrait" }, { "Confidence": 52.00172805786133, "Name": "Selfie" } ] }

With a very good level of confidence, this is the picture of a human with books on a bookshelf, possibly a portrait. A pretty good summary. Let's compare the two previous pictures. Is this truly the same person? Spoiler: yes, although I look 15 years older on the first one. Note to self: no more promo shots after 36 sleepless hours :D

$ aws rekognition compare-faces --source-image '{"S3Object":{"Bucket":"jsimon-public","Name":"julien1.jpg"}}' --target-image '{"S3Object":{"Bucket":"jsimon-public","Name":"julien2.jpg"}}

{ "FaceMatches": [ { "Face": { "BoundingBox": { "Width": 0.5596370100975037, "Top": 0.1318063884973526, "Left": 0.3889369070529938, "Height": 0.5596370100975037 }, "Confidence": 99.98912811279297 }, "Similarity": 98.0 } ], "SourceImageFace": { "BoundingBox": { "Width": 0.3883333206176758, "Top": 0.12222222238779068, "Left": 0.33666667342185974, "Height": 0.2588889002799988 }, "Confidence": 99.99468231201172 } }

Similarity is 98%. Jet lag or not, I'm always the same me. See how simple this service is? I don't see how they could have made it easier. How long would it take to design, build and *train* something like this on your own? I have really no idea and to I don't intend to find out!

Enough CLI, let’s switch to Python and run more visual examples. For this purpose, I’ve written a couple of scripts (available here), using boto3 and the Pillow image processing library.

In a nutshell:

  • rekognitionDetect.py bucket_name image [copy | nocopy ] : try to detect faces inside an image. If faces are found, each of them will be highlighted by a box and an updated image will be saved. The script will also report image labels and face information (gender, beard, glasses, etc.). Maximum number of labels and default confidence are respectively set to 10 and 75% by default.
  • rekognitionCompare.py bucket_name sourceImage targetImage [copy | nocopy ]: try to match a reference face to another image. If the face is found, it will be highlighted by a box and an updated image will be saved.

All images must be present with the same name both locally and in S3 . The last parameter for both scripts allows you to skip the copy to S3 if the file is already there. Hopefully, the code reads like well-written prose (hi Uncle Bob). If not, blame jet lag (yes, it’s the root of all evil). Anyway, there’s nothing complicated here, I’m sure you’ll figure it out in no time. Let’s play some more!

Step 4 screenshot from a Hands on Look at the Amazon Rekognition Api

$ rekognitionDetect.py jsimon-public booth1.jpg nocopy

Label Human, confidence: 99.3180236816
 Label People, confidence: 99.3190917969
 Label Person, confidence: 99.3190917969
 Label Clothing, confidence: 92.1037216187
 Label Overcoat, confidence: 92.1037216187
 Label Suit, confidence: 92.1037216187
 Label Computer, confidence: 76.0058441162
 Label Electronics, confidence: 76.0058441162
 Label LCD Screen, confidence: 76.0058441162
 Label Laptop, confidence: 76.0058441162
 *** Face 0 detected, confidence: 99.999671936 Gender: Male HAPPY 96.4477920532 CALM 8.28260231018 CONFUSED 1.53788328171
 *** Face 1 detected, confidence: 99.9654922485 Gender: Male Beard Mustache HAPPY 98.5274353027 ANGRY 5.03668212891 CONFUSED 2.61067152023
 *** Face 2 detected, confidence: 99.9955444336 Gender: Male Eyeglasses HAPPY 97.6237945557 ANGRY 1.31589770317 CALM 0.939458608627
 *** Face 3 detected, confidence: 99.9996109009 Gender: Male Eyeglasses HAPPY 98.9962310791 SAD 11.4119710922 CONFUSED 1.69576406479

Say hi to Romain, Cédric and Damian, my friendly AWS colleagues. Rekognition sees 4 males, 1 with a beard, 2 with eyeglasses, all of them very happy... and I'm the calmest of the bunch, how about that. Amazingly, Rekognition manages to catch my hardly visible laptop (left edge of the picture, on the table).

Step 7 screenshot from a Hands on Look at the Amazon Rekognition Api

Here’s a tougher one (Hallo to my German friends).

$ rekognitionDetect.py jsimon-public oktoberfest.jpg nocopy

output file
 
Label People, confidence: 99.0898742676
 Label Person, confidence: 99.0898971558
 Label Human, confidence: 99.0639343262
 Label Alcohol, confidence: 88.8537063599
 Label Beverage, confidence: 88.8537063599
 Label Drink, confidence: 88.8537063599
 Label Crowd, confidence: 84.0972671509
 Label Female, confidence: 84.0796279907
 Label Girl, confidence: 84.0796279907
 *** Face 0 detected, confidence: 99.9854202271 Gender: Male HAPPY 60.5386123657 ANGRY 12.2481765747 DISGUSTED 2.10083723068
 *** Face 1 detected, confidence: 99.9825744629 Gender: Female HAPPY 98.0062866211 SURPRISED 10.8561573029 SAD 0.810676813126
 *** Face 2 detected, confidence: 99.9904937744 Gender: Female HAPPY 84.5134887695 SURPRISED 8.68589305878 ANGRY 1.35719180107
 *** Face 3 detected, confidence: 99.9073257446 Gender: Male Beard Mustache HAPPY 80.5190963745 SURPRISED 23.9800624847 ANGRY 1.17569565773
 *** Face 4 detected, confidence: 99.9972229004 Gender: Male Mustache HAPPY 75.2949371338 CONFUSED 10.9511556625 DISGUSTED 1.91761255264
 *** Face 5 detected, confidence: 99.9999771118 Gender: Male HAPPY 35.9886474609 SURPRISED 3.75992059708 ANGRY 2.48707532883
 *** Face 6 detected, confidence: 99.9915084839 Gender: Female HAPPY 99.4766082764 CALM 0.791561603546 ANGRY 0.620931386948
 *** Face 7 detected, confidence: 99.9998931885 Gender: Female HAPPY 99.8826293945 SAD 7.21873044968 DISGUSTED 5.48685789108
 *** Face 8 detected, confidence: 83.6580963135 Gender: Male Eyeglasses SAD 94.9213943481 SURPRISED 76.9153442383 HAPPY 8.52976131439
 *** Face 9 detected, confidence: 99.9944610596 Gender: Male HAPPY 27.327457428 DISGUSTED 26.6790218353 ANGRY 12.1302127838
 *** Face 10 detected, confidence: 99.9998855591 Gender: Male SURPRISED 99.2624435425 HAPPY 22.0922241211 SAD 6.69546127319
 *** Face 11 detected, confidence: 99.9861831665 Gender: Male SURPRISED 60.7816810608 SAD 7.07310438156 HAPPY 3.66672611237
 *** Face 12 detected, confidence: 99.9990692139 Gender: Male HAPPY 48.0631027222 SURPRISED 2.61369943619 CONFUSED 2.40399837494
 *** Face 13 detected, confidence: 87.6368408203 Gender: Male HAPPY 16.2307357788 SAD 14.2565965652 ANGRY 12.3210906982
 *** Face 14 detected, confidence: 99.9553375244 Gender: Male HAPPY 54.3005943298 DISGUSTED 5.99133396149 SURPRISED 3.63597273827

Wow, 15 people, including partial faces. All genders are correct. Emotions are mostly ok, but we definitely need to add 'DRUNK' to the list ;) The labels are spot on: a crowd of men and women drinking alcohol. Let's try another one. Low res, low quality.

Screenshot from a Hands on Look at the Amazon Rekognition Api tutorial

$ rekognitionDetect.py jsimon-public maradona.jpg nocopy
 
Label People, confidence: 99.2043991089
 Label Person, confidence: 99.2043991089
 Label Human, confidence: 99.1917037964
 Label Football, confidence: 97.2220993042
 Label Soccer, confidence: 97.2220993042
 Label Sport, confidence: 97.2220993042
 Label American Football, confidence: 83.3328475952
 Label Athlete, confidence: 78.3234786987
 *** Face 0 detected, confidence: 99.963470459 Gender: Male Mustache SURPRISED 21.8802871704 CALM 17.4065952301 SAD 11.6566238403
 *** Face 1 detected, confidence: 99.9813308716 Gender: Male Eyeglasses HAPPY 38.6969680786 ANGRY 6.79734945297 SURPRISED 2.61010527611
 *** Face 2 detected, confidence: 99.9385604858 Gender: Male SURPRISED 36.6970825195 SAD 7.66330337524 ANGRY 6.10639476776
 *** Face 3 detected, confidence: 99.9514923096 Gender: Male SAD 32.6836242676 DISGUSTED 4.55095767975 HAPPY 4.19711828232
 *** Face 4 detected, confidence: 99.8046951294 Gender: Male Beard Mustache SAD 46.0139579773 HAPPY 4.15547084808 DISGUSTED 0.981283187866
 *** Face 5 detected, confidence: 99.2888412476 Gender: Male SAD 90.2270889282 CALM 5.9303817749 HAPPY 3.26179981232

Labels are fine, except for 'American Football'. 83%??? Gimme a break, the training set needs more Soccer images! In addition, I don't think number 4 is wearing eyeglasses, but again this is a low res picture. Apart from this, Rekognition correctly picked up all faces and funny enough, the expressions make sense too: "sad" and "surprised" are definitely how these guys must have felt against the legendary Diego! A last one for the road: how about this complex abstract-ish nighttime picture of Shinjuku?

Step 5 screenshot from a Hands on Look at the Amazon Rekognition Api

$ rekognitionDetect.py jsimon-public shinjuku.jpg nocopy

Label City, confidence: 88.4259796143 Label Downtown, confidence: 88.4259796143
 Label Metropolis, confidence: 84.8462677002
 Label Urban, confidence: 84.8462677002
 Label Night, confidence: 69.7816467285
 Label Outdoors, confidence: 69.7816467285
 Label Shop, confidence: 68.228477478
 Label Flyer, confidence: 60.3522796631
 Label Poster, confidence: 60.3522796631
 Label Neighborhood, confidence: 55.3994293213
 *** Face 0 detected, confidence: 97.9367828369
 Gender: Female SAD 46.1420478821 ANGRY 7.63346576691 HAPPY 6.28939962387

Note that I lowered the confidence threshold from 75% to 50% get more labels. Still, Rekognition does a good job. It also gets the girl's face and yes, she does look quite sad. The Anime face isn't detected but I guess this is the desired behavior. Alright, enough detection. Let's now try to match faces, using some of the previous pictures as well as some new ones.

Step 2 screenshot from a Hands on Look at the Amazon Rekognition Api

$ rekognitionCompare.py jsimon-public julien1.jpg julien2.jpg nocopy
 Face match, confidence=99.9891281128, similarity=98.0
 
$ rekognitionCompare.py jsimon-public julien1.jpg booth1.jpg nocopy
 Face match, confidence=99.999671936, similarity=96.0
 
$ rekognitionCompare.py jsimon-public julien1.jpg booth2.jpg nocopy
 Face match, confidence=99.9991455078, similarity=84.0
 
$ rekognitionCompare.py jsimon-public julien1.jpg keynote.jpg no copy
 Face match, confidence=99.9932250977, similarity=82.0

Quite good! The last one is particularly nice, given the distance, the angle and the poor lighting (see actual picture above). These are just a few examples and I'm sure you can't wait to try your own. Hopefully this post has given you a visual, hands-on overview of the Recognition service and how user-friendly it is. I didn't cover face collections, but the API is pretty much what you'd expect (create, delete, etc.).

Feel free to explore and experiment. Until we meet again, keep rockin’.


Originally published at blog.julien.org on November 30, 2016.