Hi. My name is John Rotenstein, and I'm with AWS Training and Certification. I'm here today to tell you about Amazon Rekognition. I'll be doing it through a number of hands-on demonstrations, and at the end, there's even a self-paced lab where you can try it for yourself. My demonstration is divided into three sections. First of all, I'll be telling you how to use Rekognition with still images. Take a photo, send it to Rekognition, and recognize faces and even objects within the picture. The second part will be showing you how you can use Rekognition with video files. So you can upload a video to S3, send it to Rekognition, and you can say, "Hey, I found these various people within the video." The third part, we'll be working with streaming video, where you can send streaming video to Rekognition and it can detect objects in that streaming video and you can react to it in real-time. So let's begin. So in this first section, I'll be showing you how you can use still images with Rekognition. To demonstrate features, I'll actually be sending a continuous stream of images to Rekognition to show you what it can do. For my first demonstration, I'll show you how Rekognition can detect faces within an image. So here I've got my computer capturing the video, sending it to Rekognition, and coming back and saying, "Hey, I can see a face here." It's drawing a border around my face and if I smile, it turns green. If I frown, it turns red. Now, this also works for multiple people. Folks, do you want to come on here? So it can detect multiple faces within the same frame. So everyone smile. Look unhappy. Fantastic. Thank you. So how did that demonstration work? So here is a sample of the code that I used for that. It's very simple. What it does is it captures an image from my built-in webcam on my computer, it then shrinks it. The reason I shrink it is, Rekognition only needs about 100 pixels to be able to detect a face, and there's no need to send a big picture across all that bandwidth. So I shrink it down in size, I sent it to Rekognition, and call the detect faces command. Detect faces can either point to an image in S3 or you can send it the actual image from your code. Then Rekognition comes back with a whole load of information about the faces it saw in the image. Here is an example of the information that comes back. In my case, I used a bounding box to draw a rectangle around each of the phases, but you also get information back about, was a person wearing glasses, were they smiling, do they have a mustache, and what's the agenda? So you can collect a lot of information from just a still image. So for my second demonstration, I'd like to show you how you can recognize a face within a picture. You do this by building something called a face collection. Let me demonstrate. So here, I'm sending my picture to Rekognition, and it's saying who it thinks it detects in the picture. So in this case, it's saying John. I might now ask somebody else to come into the frame and see if we can detect. It's saying, "Oh. I think that person is Sean" Fantastic. Art, do you want to hop in? It's saying Art. Now, if you have more than one person in the picture, it actually detects the largest face. So if I come closer to the camera, it's John. But if Art comes closer, it says, "No. The biggest person is Art." That's Searching faces from the face collection. So here's the code behind that demonstration. What it does is it captures that picture once again, it then sends it across to Rekognition, and calls the search faces by Image command. It looks through a face collection. I'll show you in a moment how to put a face into a collection. Rekognition then comes back with which face it saw, the name of that face, and I simply displayed on screen who it said it was. If you have multiple people within the same frame, it will only choose the person with the largest face. The information comes back like this and it says, "Hey. I found John and I'm 99 percent confident that it was John." For my third demo, Rekognition has the ability to detect text within the image. So in this case, I'm going to show it an image and it will read the text that it can see. So now I'm sending the frame to Rekognition, and it will detect text in the picture. So here's my backpack. Let's see what it can see on there. So it's detecting the word Amazon on the backpack. Fantastic or here is a white marker I've go. It should be able to read also there. It says whiteboard marker, and as I move the pen around the screen, it will track and move the text with it. So in that demonstration, you saw how I showed some text on the screen that was coming back from Rekognition. It came back a bit like this. It comes back in a JSON object and shows me the text that was there and it shows me where it was found on the screen. Rekognition not only returns each individual word, it can return a line of text. So if you're not interested in where the text is placed, be interested in everything that it's saying, then it will return that information to you as well. In all, Rekognition has quite a few API calls you can make to find things in pictures for detecting faces, detecting texts, and searching faces by image. They're the ones I showed you. It can also find objects in an image, find me a skate board, find me a bottle. It can also recognize celebrities. So if you want to look through video or look through picture, and find somebody who's famous, it can do that for you. It can also detect adult contents. So if you have a picture and you want to make sure that it doesn't have anything that shouldn't be shown to the public, it can automatically do so. So now in Part 2, I'd like to show you how Rekognition can go through a video file and detect faces that it finds in that video. The use case I'm showing here is I've got a very large video file and the file has lots of different people in there. I would like to take that large video and automatically create a new video, which is just showing one particular person. So to do this, Rekognition will have to go through the video, find each person, output the information, and let me splice it together again. I'll show you what it looks like. Here, I have my source video. You can see in here many different people's faces are showing up in the video. The output will be something like this. Here is a video showing Eddie, all the scenes with Eddie appearing in that video. So how was it done? Well, the first thing I did is I captured a picture of each of the people that were in the video and I use this command called create-collection which created a face collection. In this case, I call them trainers. Then, I had a picture of each person and I added it into the collection using this index-faces command, which points to a picture in S3, and I give it a name. In this case, John, saying, this is a picture of John and Rekognition will associate that external image ID with that picture. It's also worth mentioning Rekognition does not store that image, rather it looks at the image, it finds the face in the image, and it looks at the attributes of the face. Where are the eyes, where's the nose, where's the mouth, and it builds a mathematical vector of the face and that is the only thing that is stored in Rekognition. Once Rekognition has those pictures, I can then tell it to please analyze the video. So I use this command, which is the start face search command. It says, "Please go off to S3, grab this video for me, and compare it to the face collection called trainers." It then goes off and looks for all of the faces within the video. Let me demonstrate. So the first step is to create a face collection, and I'll do it with this create-collection command. So this is saying please create me a collection called trainers. The next step is to take a face which is stored in S3, and say please edit to the collection. So here I have a command that is saying, please take an image called John, and load into that trainers collection, and it comes back with a whole lot of information. In this case it's saying I found a face within that image and here is a bounding box around it. Here is the nose. Here's the mouth, also here is the pitch and yaw of the face so it can tell if a face is looking left looking right looking up etc. This is very good for mapping attributes on the face in case you want to make it look like somebody is wearing a mask or a funny face. So I have now loaded a number of faces. You can see here there is an image tagged as John, an image tagged as Edward, and an image tagged with Karthik, and MJ. So there are now four faces in my face collection. So the next thing I have to do is ask rekognition to please go through the video and find the faces, and I do that with this start-face-search command. This command tells rekognition to stop a process in the video, but it can take a long time so it comes back with a job ID, and I can go back to it later and say, ''Please give me the output for this particular job.'' So rekognition has now finished processing that video, and I can run this get-face-search command to retrieve the results of that search, and lots of information appears here on screen. What it's doing is it's returning for various different timestamps, who it has seen. So here it saying it this particular timestamp, I can see that Edward appeared in there and rekognition is a 100 percent confident that he is there. So it's now going to tell me for each timestamp who appeared in the frame. What it will give me once it's finished is output like this, and it's showing me for each timestamp within the video who did it see. Now there's lots and lots of timestamps in the video. It's several minutes long, and you can see from this output that its giving me information every second or so. So it will give me an awful lot of information about who it saw in the video. My challenge is to take all of this information about where people appear in the video, and output another video just showing one particular person, and to do that I'll be using an Amazon product called Amazon Elastic Transcoder which is a service that can transcode video files, and one of the capabilities it has is clip stitching, and what this means is I can say, go to this video at this particular timestamp and grab x seconds of video and keep doing it, and so will grab different bits of video and splice them together for me into one single video. My challenge in this case is that rekognition outputs the timestamps when it sees a particular person. So it says John is here here here. Whereas Elastic Transcoder just wants to know when to begin and how long to record. So I've got to convert this format of timestamps into individual scenes. So I wrote a little Python program that says, hey John appears at these particular timestamps, let's convert it into scenes. If he disappears for more than a second let's end that scene, and start it again when he reappears, or if he appears for less than a second let's skip over that scene because we don't want the video to be too jerky cutting in and cutting out. So the result of that was we had an input video that had lots of people. We created a face collection which taught it the faces of the various people in that video. We then said produce me a new video with just one specific person, and Elastic Transcoder gave me a video file. One for each of the people I requested. So pretty nice demonstration of how you can recognize faces within rekognition, and then use Elastic Transcoder to create a new video from that source video. If you're interested in how this worked, there is a blog post available on our machine learning blog posts. It's called automated video editing with you as the star, and that will give you an overview of how this demonstration was produced, and if you want to try it yourself we also have a hands-on lab by the same name. Where you can go off and try the exact same steps that I have demonstrated. Here is an overview of the commands you can use with rekognition video. I showed you the start-face-search command where it looked for faces within the video. You can also recognize celebrities, look for objects within there, and person tracking this way it can say hey a person walked into the scene, and they walked cross the screen. They disappeared and then they came back into the video again, and you can track them throughout that whole video process. So that is using rekognition on video files stored in S3, and the third part that I would like to show you is using rekognition streaming video. Now you don't stream the video to rekognition you actually stream video to our Amazon Kinesis Video product, and Kinesis captures every video frame, and it then sends that frame to rekognition, and rekognition then performs the same analysis that I've shown you so far. So it might pass back and say hey I've just seen John in that streaming video. So to show you how this works I'd like to give you a bit of a demo. In this case we set up a video camera showing people's faces. We passed it through Amazon Video Rekognition using Amazon Kinesis Video, and used it to trigger a light. Now why do you want to turn a light? Imagine you might be opening a door to your home when you show your face in front of the video camera, and in this case I have my fellow trainer Navjot showing his face to the camera, and you'll see if that the light turns on. I then sit down showed my face, and unfortunately the light did not turn on. I can't gain access to the house, and just to prove it works Navjot sat down once again, and the light turned on. So this gives you a bit of an idea of how you can use streaming video to trigger things, and there wasn't very much code at all to make this work. The first step is we set up a live video stream into Amazon Kinesis Video. It then sent each frame to Amazon Rekognition Video which analyzed the frame and looked for certain people's faces. It compared those faces against the previously created face collection. In this case with Navjot's face. It then outputs that information in an Amazon Kinesis Stream, and that stream has all that information. I saw a face on this location, it was this particular person's face. Then we used a feature called Amazon Kinesis Analytics, and what it does is it looks at the streaming information coming through the Amazon Kinesis stream, and we said hey look for the name of Navjot being successfully recognized in the video image. When his face was detected, we triggered an AWS Lambda function that went out and said hey IoT turn on the light bulb or open my front door. Now you might want to be a little bit careful because you could theoretically open the front door by holding up somebody's face. So you might want to do it so they have to rotate their face, and make sure they're alive human being before you let them into your house. So hopefully you've learned something about Amazon Rekognition today. First of all we saw how you can use it with still images, how you can use it with images stored in Amazon S3, and how you can use it with streaming videos. So hopefully you'll recognize me John, as I walk down the street in future. Thank you.