In this lecture, we are going to study how to extract 3 Dimensional velocities from optical flow. Until now we have studied structure from motion with discreet pictures from several viewpoints where we didn't have to use or assume any continuity between the pictures. But videos taken from quadrotors like the video in this movie here, are taken from a continuous trajectory. There is not a big base line between these frames, and we can characterize the geometry of this problem not with point correspondences. But with velocity vectors, with optical flow vectors. Let's look at this video back again and try to imagine what is the direction the quadrotor is really moving? We see in this case that the direction is pretty much forward, and we can infer a lot about this video. Points that are on the bottom are much producing vectors and displacement which are much longer, while points on the back are hardly moving. We can also see that we can detect obstacles like this tree in front of us. How can we infer the direction that the camera is moving from the video and from the optical flow that we can compute from this video? This is another example from a camera taking from a car. The camera is mounted on the windshield. This is one of the very usual cases these days in driverless cars. And you see that their optical flaw is really changing. It's changing when we're moving forwards that everything is moving quite fast in the left and the right of us. And it really also moving much faster, when car is turning to the right. Let us study this optical flow field. What we get by just the corresponding pixels in subsequent frames or just by velocity of the pixels. If camera is only translating like this, in any direction. Then the vector field is radially expanding from a point we have marked this vector field. We call this point the focus of expansion and by just looking at these vectors, We can infer which direction the camera is moving. The camera is moving forward and slightly to the right, like this. This is very easy to see from this radial field. Now let's say that the camera is only rotating in this case, the camera is rotating like panning to the right. Then we see a more uniform vector field most of the vectors are almost horizontal they start curving at the left and the right. And they coordinates like the lengths of the vector's starting getting larger on the left and the right. If the camera rotates only about the optical axis, we see like a turbulent pattern like a curling vector field. Again in this case, it's quite easy to infer where is the rotation axis? It's really at the center of the turbulence, but if we combine translation and the rotation we get a really mix vector field. There is no clear radial pattern there. There is no clear rotation pattern. If we asked to question in this vector field which direction we are moving, we might guess that we might move somewhere to the right. But it's very difficult to estimate where. We have looked at all the directions of the vectors until now, but what about their length? Let us look how can we algebraically compute this vector field. Let's assume a fixed point P in this scene, and the moving camera with linear velocity capital V, and angular velocity omega. From physics we know that the velocity of this point, the way it is observed from the observing camera is minus omega cross product P-V. Now, all of these are three dimensional vectors. But we know that our calibrated projection is x divided by z, and y divided by z. So we can write our calibration as small vector p and the scale vector big B. So we have on one hand on the left the derivatives of the projection of the point and on the right, we have the derivative of the projection in 3D. If we combine all these together, we get the equation for the optical flow field. The equation for the optical flow field, it has two components, two additive components. One is the translational flow, and we can see two examples of translational flow, actually on the left column. And the right is the rotational flow and again we can see two examples of rotational flow. And these are two decompose flow mix, it's not the mix difficult situation. What we can observe at the bottom where we have a plotted the vector field from the car, I see the vector field is coming only from the ground. We see that while on the rotational field on the right there is no difference in the lengths. Where the points are at infinity or very close to us, all of them are about the same. We see on the left that points in a trinity are hardly moving like the horizon is really not moving, and points close to us on the ground are moving very, very fast. So what we can say very confidently is that the rotational flow is independent of depth. While the translational flow depends on depth. And you can see clearly in the equation, that the rotational flow depends only on the omega. The angle of velocity and the calibrated coordinates xy, while the translational term depends also on the depth. If we look carefully at this equation, we observe that if Z is known, if some oracle gives us depth estimates, some new sensor like the Kinect then the p dot is a linear function of V and omega. So if we have a depth sensor moving, then it's very easy to find the translational and the angular 3D velocities. As a matter of fact, there are 3 unknowns in the V, 3 unknowns in the omega. So with 3 optical vectors, we can recover these 3 velocities if we know the depth. Let's see what we can recover, if we don't know the depth and we have only a rotational field. If you're on the rotational field again, we are very lucky. This is linear with respect to the omega, we don't have to solve for depth because it is independent of depth. And if we know that the camera is just rotating, we can estimate the angular velocity. Now what about the translational flow field? The translational flow field again, depicted here as it is used from the relative motion of a ground plane, has this radial pattern. On this radial pattern actually, if we just take two vectors and I intersect them, I find that focus of expansion, which is nothing else than Vx divided by Vz and Vy divided by Vz. Obviously if Vz = 0, then it means that we're moving just parallel to the image plane and the focus of expansion is at infinity. At thus with intersecting, the two lines that go through the optical flow vectors, we can find our translation direction. If we find the translation direction and we find the FOE, we can have the distance of the point from the FOE. And if we divide the length of the vector with the distance, we find this ratio which is a translation in z divided by z. If we take the inverse of it, that let's divide it by the translation in z, we can observe that the units of these are meters divided by meters per second, which is seconds. This is called the time to collision, and it is a very important estimate that animals can do, which cannot possess binocular system. Animals which adjust monocular system where the two eyes do not overlap. They rely only on this optical flow field and to avoid bouncing of things. They estimate this time to collision. Points of the same radial distance from the FOE have exactly the same time to collision if they lie at the same depth. Now let's see how we can find the FOE a bit more formally without assuming that the z component of the linear velocity 0. We can observe geometrically that our flow vector is perpendicular to the cross product of p and V. If we rewrite the equation, we know that V is perpendicular to p cross V dot. And by just having two points with this triple gross product, we can find that the direction of v. That's the direction we cannot find the absolute speed. This is just at 3D geometric, more general formulation of the entire section of the lines find FOE. Now, if we have more points, we can write the equation v transposed p1 cross p1 dot for all the points. For p1, p2, pn, and we obtain a matrix which gives n rows, as many as the points and three columns. And this matrix we know from the construction, because all these vectors are intersecting at the FOE of expansion, that it has rank two. So by just using this [INAUDIBLE] value of composition, and taking the null space of this matrix, we can find the translation direction. This is still a pure translational flow field, but what can we share about this flow field for example? We see many vectors actually being quite crazy here, and this is because it is from a random point load in 3D. Some points are very far and have very small vectors. Some points are very close and they are the addition of a translation vector and the rotation vector. A rotation flow field like the red one, and a translational flow field, the radial one like the blue one. We see how beautiful it looks, if it has two components. But if we add them, it looks quite messy. So the point is if we have these orange vectors, how can we decompose them in these two components? The trick about it lies again in the linear form of the equations. What we're going to do is to group together the inverse depth and the omega, which appear linearly. So we're going to write the optical flow vector as a matrix which depends on the translation and xy. And the vector which depends on the inverse depths and the omega. If we have a full vector field of n vectors this yields a left hand side with known flow vectors. And then a multiplication of a big matrix which is a function of the translation times all inverse depths and the omega. This means if we knew our translation, we would be able to find further rotation and the Z. If we know the omega and the Z there is some way to find for the translation. We will find it which is just by solving a radial translational fit. And this looks like a chicken and egg problem, but we are going to do the following to really solve it in one step. Look again at this equation. If we solve actually for the unknown, vector of inverse depths and velocities in this other constraint problem. We're going to get the pseudo-inverse of this matrix V which is a function of V times the vector of all optical flows. If we insert back this vector, we're going to find this functional, which is just the function of V. The V is a function of two parameters, this is just the position inverse planes or a direction on the sphere, because we can never infer speed of the comet. So if you look at it carefully, to build this equation we have to compute the pseudoi-nverse for every position of the flux of expansion, V. Otherwise everything in this equation is null. So we sample all possible directions of the FOE, the way it is in this video, the sphere of all directions, and we compute this residual for every function. So we have the optical flow on the top, and we can compute this residual for every direction V. And we can see in the very dark blue area of this, that we have quite the clear minimum, which is the direction we are moving. And we're going to observe that when the car is ready to turn to the right, then we will see that for the foci of expansion as a minimum. We're also turning exactly this point to the right. What we can also observe on this view which is a narrow fraction, is that sometimes the minimum is very, very sharp, and sometimes the minimal becomes more ambiguous. And these are cases where the translation, the rotations are confused, because there is not enough depth variation in the scene. In this lecture, we have seen how to estimate the translation and rotation direction from a continuous video. This can be used both for control of a vehicle like the quadrotor or a car, or even for avoiding obstacles in front of us and computing the time of collision to all points in the same.