Today we're going to talk about the concept of optical flow or 2D point correspondences. As we fly through space, looking through the views that's coming to us we see it continuous motions around us. Objects are moving, we're moving in space. To a computer haver it sees points, two dimensional feature points, and these points are matched from one time instance, to another time instance. And it creates a space time curve. From the set of correspondences, we have seen that we can triangulate them. And find out where those three dimensional points live in the space. As well as a reason about a camera rotation translation between them. The question is, how do we obtain this precise correspondences across different views? Then we want to identify a points in the two dimensional image and we would follow it in our field of view as it moves through different time. And this is called optical flow. So here we have two images taken at slightly different angle as we move into space. We show a zoom out view, zoom in view of the windows here. The movements is usually very tiny between one frame to another. As we mark with the red, that particular points in the window, we see that point is moved slightly in the next frame. And this positional shifts changes, gives a sensation of motion and it'll also estimate the camera translation rotation as well as the position at this point in 3D. To simplify this problem, imagine we're looking at two Lipsido dots. On the left we have an image taken at time zero, and the dots in the second time instance have moved slightly up, to the left. And the two images are related by a simple translation between the two. That is the say, that if I like it the image on the right Jx. When J is it now a function the map position your image pixel to the intensity. Again, can think of this case as a very simple image, where a two dimensional pixel is layed out on the grid, and the height, which is the brightest value, is simply is a bump. And so, shift image up and down. All right, so move in the space, and that appears moved. And that can be described as the function J taken different x value. X plus d, and that value is the same as xi. Meaning that for every pixel in the Js image, if I offset them a little bit it can be the same value as seen in pixel I. So it's also called the brightness constancy. So now our goal is trying to find this displaced vector d, such that we minimize the error between the image on the left and the image on the right. So, for different d amounts, to shift these images by different amount. We're seeking for this two dimensional shifts, d, such the brain image, i, and j in alignment. So, in our non linear squared problems, every pixel in this image provides the constraints. The constraint is measuring the intensity value at that two pixels. As you see in the background, there's no error because we couldn't see anything. But on the pixel which the two image has overlap and had different height values, the error shows up. What's showing here is if at location d equals zero, meaning I'm not shifting any pixels, the error for some of the pixels are quite large. And we want to minimize this error by finding this d vector, the shift, the two images such that align and such that the arrow we see here all go down to zero. This is a null linear square problem, because the function mapping from pixels to intensity is not a linear function. Here the mapping is just taking position to a high map and the high map in general could be quite bumpy and quite non linear. The solution to this, follows a three step solution, the first step is for the non-linear squared problems, take derivative function expected d set it to zero. As we know, this is a necessary condition because as I trust this energy function, one way to tell I'm at the global or local minima is that I'm at a location where cost function doesn't change anymore. At that point, I have nowhere to go and nowhere to find additional information, where to go next. The second step of this computation requires taking this non linear function, j, which match pixel to intensity value, and turned to a linear function of it. And this is done through a Taylor expansion. And the last step, is to find a close form solutions for this least squared problem, assuming a Taylor expansion. That allows to find a vector where we can move in a space of d such as the air is reduced as much as we can. And then what we do is we transform the image by this vector d and iterated process again. So let's start with step one, step one requires us to take a great deal of the error function respect to the variable we're looking for. In this case translation vector d. As we recall this great end consist a Jacobian matrix and the two images of functions, the difference between them. And this is set to zero. We can further derive this computation one more step. Recall the gradient with respect to d, of image i, it's the same as greater respect to x. What it's measuring is if I take the image and move left from right, how much intensity changes? And that's exactly the gradient of the image. Here, we show the gradient image as a color map, where red is large value, and blue is a negative large values, and green is zero. As we can see, the Jacobian matrix store with it for every pixel, how much intensity is changing, if I move left, and right An error function is shown on the right. The second step is called the Taylor expansion. The idea is that we take this image which is mapping from pixel to the intensity and we want to linearlize it. Another way to say it is that we want to take image x plus d, and written as a constant function i of x, which is the current pixel value plus a Jacobian times d. Now here I showed the picture form of that. The Jacobians are two pictures in the south. Dx and dy are the two variables in d each one a scalar. So, what we can think of is if I'm given a picture shown in j of x and give you a template which are positive, and negative, showing a green in x, and green in y. How much mixing I can do, such as the create another picture, and the only thing can do, in this process, is not moving picture, but start attracting values from the pixels to each other. It's sort of a painting process, as you can imagine if the pixel's moving quite a large amount we cannot do this but a pixel just move by little bit this simple linear combination or just simple painting process allow me to create a new picture. So now we have gone through the first two steps we take the gradient of the error function with respect to d set to zero. We take the Taylor expansion and we can substitute ourself the Taylor expansion version to the original cost function where it's highlighted here in blue is the Jacobean matrix. So a simple step of rearranging of the equations we obtained at the following equation where we have the Jacobean matrix transposed times the Jacobean matrix on the left. Times the unknown variable d, multiple right the Jacobean matrix times the error function between the two pictures. Here's the picture of the error. And given that we want to look for a two dimensional variable d such that we can reduce or soft the error as much as possible. So, we will have two equations for every pixel in the image, and I'm having two unknowns. A Jacobian matrix transposed time itself. In fact, as a name, it's also know as the second moment matrixfFor every pixel, this provides a two by two matrix. So for every pixel we can construct, its gradient which is two dimensional vector, transpose it, multiply by itself, obtains a two by two matrix on every pixel. So here we show collectively for every pixel these two by two matrix. So every pixel has s two by two array and they can visualize as images. So this is a Jacobian matrix for all the pixels in the image. I can see some of the values are positive, indicated by red, zero is green, and some of the values are negative, shown as blue or cyan. Review again, we have on the right hand side the difference between the two images, blue means negative, red means positive, green is zero, multiply by Jacobian, the two combined together obtains this image. Again, every pixels has two values, two dimensional vectors, one for x, one for y. And when you multiply the error with the Jacobian, you obtain the following pictures. It tells you if I move in a different direction combined with error function, how much things are changing. So now we have a set of constraints. Two constraints, dx, dy for every pixel. Now we can visualize this process of looking for dx, dy, as the following painting process. When given a total of six pictures, two of the pictures on the right is a Jacobian times error function in itself. And the four on the picture on the left is the matrix for every pixel. And what we want to do is mixing them by a skill factor such that they look like each other. So I can see from the top row we're mixing the two top pictures by a factor of dx, dy into the right hand image. As you can see because the first image is mostly positive, and the final image needs to be all negative in blue, the dx components is most likely to be negative. Now we look at the bottom image, we can see that the bottom center image is all positive, where the final images needs to be mostly negative. And that tells me the dy components also should be negative. Another interesting point about this is that in order to figure out the horizontal shift which is dx. I'm simply looking at the x component which is the top left corner image And then look at the vertical movements and mostly look at the Jacobian matrix times itself in the bottom middle image. Up to now, we had compute error function for every pixel. Now we can combine all the pixel together by adding a summation in their function itself. And then each to the same equation with summation in front of the second moment matrix as well the Jacobian matrix. Visually this is the taking the four picture and some other value together into a two by two matrix. Instead of a two by two matrix for every pixel, is a two by two matrix for the entire patch we're looking at. Similarly on the right hand side instead of having a two by one vector for every pixel, those values are sum across the entire patch obtain a combined two by one vector. So instead of painting pictures, now we're painting simply pixels. We're looking for the scale vector dx and dy. Such that we obtained the pixel value on the right. So now I have seen given image i and image j is shifted. I take the difference between those two images compute a second moment matrix invert it, we obtain estimate of how much I need to move. In image space dx and dy, such that i and j as close as possible. Now we have computed and we're going to take that value and shift the picture with it as shown here. So this is a picture of j shifted by the d which we just estimated. As you can see, this shift bring the dot closer to the center of the image where you started. And I can see also it's moving closer to the center but it's not quite on the target yet. And this we can see also from the error function many of the pixel error is driving down to zero but we still have many of the pixel have large error. So the third step is we iterate. As we take the new initialization of the j shift by the d vector that computed, I can take the same procedure one more time. Take the Jacobian matrix, take the second moment matrix, invert that, obtain the new translation back to dx, dy. That you can see moved the dots further close to the center. As such, the error reduce significantly. And in the final step of iterations, in fact, we have find the correct shift of the dots from the right image to the left image. And you can see the error function all reduce down to zero. Back to the window case. We have the left image i, right image j. We can start with the relative shift of zero, obtain the following arrow function. We can compute a Jacobian matrix. Jacobian matrix is the gradient of image, one for x, one for y. It tells me how much the image intensity is changing as move from the left and the right, as well from the top to bottom. For this window image, you can imagine the intensity function is a complicated function, in terms of x. In fact, illustrated here it created an axis along the simple functions needed as greater than y. We proceed. Compute a second moment matrix. Two by two matrix for every pixel and we here it displays as four pictures for every pixel together. Similarly we constructed the error function times the Jacobian shown on the right. We have a two by one vector for every pixel and we've shown collectively for all the pixel together. Given the two images, the previous step give us a best estimate of how much I need to shift the images from the right to the left, such the errors reduced. As we can see that the first estimation of the shift is making the window horizontally aligned. You still have a large error in the vertical direction, which is shown in the error map. We have to further iterate. You can see this third step is bringing the window further down such as it's more aligned with the first window. And the error function is reduced but not minimized completely. As is shown on the right. One more step, images are perfect aligned. Through this process, we identify feature points on the first image, So this step of iterative search We're bringing ourself closer and closer to the second image. And this allow us to track the feature in time. We repeat this process as new frame comes in. Obtain new position of the corner of the windows as move in space.