Hi everybody, this is Julien from Arcee. In this video, I would like to show you how to use Amazon Recognition, our image and video analysis service, to build a virtual proctoring application. Proctoring is the act of supervising an online course or an online exam. Given the state of the world today, many of us are working from home, so whether we're students taking online exams or professionals taking online certifications, there's a strong need to verify that the person sitting in front of the webcam is the person who's supposed to be there and that this person is not using a phone, books, or any resources to cheat during the class or the exam. That's the problem we're solving today.
Using AWS CloudFormation, we can automatically deploy a virtual proctoring application. This is a web-based application that uses Recognition for face detection, face comparison, object detection, and so on. It uses a number of APIs in Recognition that are invoked by Lambda functions, which are hidden behind APIs in API Gateway.
This is a really fun demo. Let's get started right away. This is our starting point: an AWS samples repository that has everything we need for the demo. We can see the architecture here. It's pretty much what you would expect. We have a browser-based application hosted in S3. This application exposes a number of APIs. There is one for authentication because, as an admin user, you can register new users to the proctoring application. We will see how this works. There's an API that captures frames from your webcam and sends them for analysis to Recognition, and there's the add a new user API. All of these are handled by Lambda, and the actual Recognition work is done by this Lambda function. It will try to detect persons, objects of interest, perform face detection, unsafe content detection, and search for a face in the collection.
Before we do the full setup, let's fire up the CloudFormation template. I'm using the Ireland region here. We have a template, so I can just click on next. I need to enter the email address for the admin. That's the person who will register new students. Then I can keep everything else as is. We have a list of Recognition labels for objects of interest. Let's start with mobile phone and cell phones. I'll show you later how you can update the stack to add new labels, such as books. Just click on next, then next again, and create the stack. Tick those boxes and create the stack. It's going to take just a few minutes. I'll pause the video and we'll meet when the stack is ready.
After a few minutes, the stack is complete, and we can see all the related events here. We created the Lambda functions, API Gateway, and a Cognito user pool for authentication. I also got an automatic email to the admin address I specified. We can grab this temporary password and connect. I should change the password. Now we see the virtual proctoring application. I see myself twice on the screen, but the first thing is to add a new user. I'll add myself and use a screenshot I took previously. This will get added to the Recognition collection of faces. When we start the proctoring application, whoever stands in front of the webcam will be matched against this collection.
We can start the app now. Within a few seconds, we should see some information on the right. Objects of interest: zero. I'm not using a cell phone. Person detection: two. I think it's my painting in the back. Let me remove the painting. Now it's only one. Good job, Recognition. You can even recognize people in paintings. Person recognition is me. Face detection: one. No unsafe content. If I was taking a test, this would be fine. It's only me in front of the screen. I'm not using my cell phone, and there's no one else in the room.
Let's say I want to cheat by calling someone who knows all the answers. It caught me. Cell phone, mobile phone. Whoever's supervising the test now knows I'm cheating. Now let's say I want to cheat with a book. It's not picking it up because it wasn't told to detect books. That's something we need to fix. Now, let's run one more test. What if someone smarter than me tries to take the test? I'll call my son, who's arguably much smarter, to sit in front of the cam. It's one person, but it's not me. Personal recognition is not working here. If I show up now, it detects two persons, which isn't good because I'm supposed to be on my own. Whoever's supervising would ping me on the chat or yell at me that there's someone else in the room.
Now, let's add extra objects to be detected, such as books. We go back to the stack and click on Update. Use the current template and add labels. These would be Recognition labels. If you want to figure out the right labels, take pictures of all the objects you don't want to see during the test and run them through Recognition to understand how they're called. If we go with "book" and "textbook," it should be okay. This will ask the Lambda function to look for additional labels. Just click on next, next, and update the stack. It's going to update the stack, probably redeploy the Lambda function, and I'll see you in a minute.
After a minute or two, the stack has been updated, and we can see the Lambda function has been updated. Going back to the application, we can start it again and try to get a book. I'm cheating. Remove the book. Let's look at another book. I'm definitely cheating. It's really easy to update this by redeploying the function that detects the objects. If you want to add more labels, that's all you need to know.