Hello, so now we're going to see in more detail what the algorithm or how can we build this algorithm to do surveillance. So if you remember, the last thing we did was to construct a flowchart of what we need our algorithm to do, from inputting our video into the computer and have a warning. So the first thing we really need to understand is what is an image and how does the computer turn the image into numbers. So an image of m by n pixels, if you have heard the term before, is represented mathematically by an m by n matrix. So if you have the picture on the screen you will have n columns of numbers and m rows, and each one of those numbers represent a color. So if you remember really old pictures in our computer you would say it's so pixlelated because you could see the little squares, and in each square there's only one color. That's what it means. So to turn the image into a vector, and you may have heard these terms before, is you take each column of that matrix and just stack one on top of the other. So the first column, you put it first, then the next column, and the next column. And you turn it into a tall, what in mathematics you call, a tall vector. And that vector that it will be m times n, right? Because you have the n columns and the m rows and that what is called vectorizing an image. So if you now see a video becomes a matrix because each of the frames is a very, very, very long vector and if you stack it together you will get an m*n times K matrix, where K is a number of frames you have. So to put this in an example, what is the size of the matrix that represents what you see in your monitor? For example, for the monitor I have here a screenshot. The resolution of this monitor is 2560 by 1440 pixels, then m is 1440n, and n is 2560. So each frame of these is represented by a vector of size 1440 times 2560. So the size of the matrix that would represent a 5 minute video with 24 frames per second with this monitor definition, if you compute the number of elements, we have closer to 4 million elements of each frame. And you'll have 24 frames per second. You'll have 5 minutes and 60 seconds in each frame. So if you multiply that, that means for 5 minutes, you have 7200 frames which means the matrix representation of the video will be 3,686,400 rows by 7,200 columns, which is just large numbers. Which means, for just a 5 minute video, you will have about 26.5 billion elements. And that's only for a 5 minute video with a 24 frames per second in one monitor. So imagine you have 20 or 100 different monitors in an airport. You cannot use a single person or even ten people doing this job. You need a computer to help you go through all this information. So as we saw before, the first step we need to take is to separate the background. So in a simple way of putting it what the GRASTA algorithm does, it keeps all the numbers in that matrix that are not moving and then just identifies the places of the things that are moving, so the colors are moving. Okay, then the second step we needed to take was to identify or characterize the moving elements. So how can you identify objects, animals or people? So a dog has tail, it has four legs. People are actually easier to identify because we have eyes, and nose, and mouth, and two legs. And packages. Packages can be just anything, right? So even if you think that people will be harder to identify, actually, for our computer it's harder to identify what is a package. But again, we can use artificial intelligence to do this. So teach a computer how to differentiate between a package, an animal, and a person. And in this case, especially, we are going to use what is machine learning. A few examples you can try at home, there are many places in the Internet where you can see a machine learning can do. One of these is to recognize, non handwritten numbers, right? So if you go to this website you can just put a number or how you write an 8 and see if the computer can recognize that you were trying to write an 8 or any other number. Another place where you can play with what machine learning does is a tool called Face Recognition using Google Cloud Vision. And again, you want to explain exactly what it's doing but it can give you an idea of what a computer can do these days. And finally, maybe you have heard about neural networks. Neural networks are a type of machine learning and it enables a computer to learn from observational data. So if you go to the Tensor Flow playground you can play with different datasets and see how this algorithm works to identify different elements. So finally, the third thing we need to differ our algorithm was to track objects and people. So, again, for the computer the objects and the people are just numbers in the matrix. GRASTA can identify what is moving. And the machine learning algorithms will identify and track as they enter or exit the frames of your video. So, how can we use this for surveillance? So to start a flowchart, the very beginning of our video just start. Then the information you have are frames, so you have to do the following for every frame. This would be our time loop in our algorithm, right? The very first thing is to use GRASTA to separate the background to decide if in that frame this is moving or it's not moving. The second step could be to use a machine learning to identify and categorize moving items. So in this step you need to have a list or make a list of items, if they move, if they're stopped, how long they moved or stop. If an object is accompanied by a person or not. And once you have the list, you have to identify which of these items are idle. And there will be a few options here, right? So if the item is listed that was removed in GRASTA meaning it became part of the background, it means it stopped moving. So if the item left the frame, you can just remove it from the list because somebody moved it out of the frame. If the item stuff moving, then you want to know if it's still with the human or humans that brought it in. Is it without its humans? If it's without its human, was it just for a short time? Or it was too long? If it was too long, you have to give a warning. But as you see, for all the steps then you think, what is a short time, what is too long? Who are the humans? Maybe a family came together and it came with two people until the other person left with the package. You would need to be able to computer that as well. After you're done with a single item you have to check if there are more items in the frame to analyze and what happened to them. If there are more items in that frame, you have to go back, go through this part of the algorithm. If there are no more items in the frame, then you can go back and repeat for the next frame. So if you see with this flowchart now we have a better idea of how to communicate or how to input these information and these algorithm into our computer. Of course, if your video finishes, then your algorithm is over. So you go to the end