Now we have computer fundamental matrix that relay points between two images of the same scene. Our final goals given these two pictures, we want to compute these three dimensional translation and location between them. And we will see in this set of lessons how to recover the rotation translation from the fundamental matrix that we computed. Recall one of the property of the fundamental matrix has allow me to see every color lines from one image to the other. As well as intersecting those epipolar lines into those epipoles. And as we see one of the properties of fundamental matrix is it allow me to compute epipolar lines and epipoles in each others image, epipolar lines conversion to epipole. And I put myself corresponding to the location of the second camera viewed in the first camera point of view. And so for example, we have left person or right person. And relative to the two views, find the image of the building we see You can estimate the right image is in front the left image. If the person take the right image,the person will be standing in front of the left person. In fact, you can see the vanishing point of the epiperipheral point in the left image corresponding to the camera center in the right. And this is the property we use to help us recover translation and rotation between the two views. Now look back to the diametric configuration between the fundamental matrix that relayed a point on the left image and a point on the right image. It's a bilinear equation, X2 transpose Fx1 equal 0, and recall fundamental matrix itself in fact encapsulate the essential matrix. With the calibration matrix on the left and the right, and assume that we'd have calibrated camera ahead of the time, K knows you can transform the F matrix back to the central matrix. This is simply taking the fundamental matrix three by three matrix. Multiply on the left by FK transpose, on the right by K. Now we have to take the essential matrix and, essentially, trace a three by three matrix and, in itself, compose two components. The component has to do with translation, it is a translation between the two cameras. And the three by three matrix enclosed t crossed product, and the second component had to do with the orientation change the rotation between the two cameras. The two three by three matrix are multiplied together forming the central matrix. Now we have computed essential matrix and we want to go backwards to uncouple this translation rotation from E itself, and we will see how to do this. To see how we uncover the translation rotation from the essential matrix the first elements we're going to identify is the translation. Recall these diversials, the construction of epipoles and epipolar lines. Bob and Mike both pointing at the same margins in 3DC. That ray plus the ray between Bob and Mike themselves form some plane. As we sweep this plane across the sea Is slice through images through these epipolar lines, and all the epipolar lines will converge to a single epipole. And the epipole in fact is the direction connecting Bob and Mike together. It's that ray between Bob and Mike penetrating through the image that correspond to the epipole. So definitely if I know the people, I can know the translation between the two. To make more precise, we can identify the acting pole on the right image as the image of the war origin, which is where Bob is, 000, since the coordinates center is sent around him. And we take those coordinates, 001, 00001, multiplied by the camera projection matrix p2, we obtain t. Therefore, the image of our origins of the camera center to Mike's point of view is impact t, and that's the epipole, in Mike's point of view. In other words, to say that, is that we also know that epipole transposed times z is equal to zero, for the second camera Recall, we can take SVD of e single value decomposition of e matrix, a 3 by 3 matrix can be decomposed into a orthonormal matrix u, and diagonal matrix which had 2 elements on a diagonal equal to 1 and 30 elements equal to 0 at v transpose. The u matrix is made of three columns, u1, u2, u3. And we see the third column corresponds to the left null space v, and therefore the translation back to t which is the epipole, corresponding to last column u. Because of the ambiguity of the homogenous coordinates, homogenous coordinates can be scale, any scale factor with an ambiguity of a plus or a minus. We see that in the next few examples. So T we can recover from the SD of the matrix, by taking the third column u and we have two possibilities one is t going forward one t going backward. Now given a t how we reconstruct R matrix. Again we'd go back to the single value decomposition of E matrix, We know sigma value decomposition made of autonomial u times i matrix times v transposed. And we also know this is made of t cross times r. The first thing we're going to do is to take t cross and write it out in term function of u. To record, t cross is an operation that take t, vector t and cross with another vector forming the three dimensional vector. It's a transformation of an arbitrary vector through the t vector itself. The way we can think about this is given an arbitrary vector in space. We define a new orientation, such as defined by U1, U2, and T, and the third dimension. Any vector is then projected onto the space through U transform, transpose. Once we interspace, we delete the t elements so that the vectors going to stay perpendicular to t and then in the space view u2 will rotate 90 degrees. And that operation then is send to u [INAUDIBLE] matrix back to the original space. So therefore, U times this matrix we see in the middle, times U transpose, provides you the operation of t cross. Now we have a re expressed the t cross as a matrix modification of autonomo matrix U times special constructed rotation matrix, times U transpose. What remains to be seen is how do we use that to recover the rotation R? We can gesture the rotation R matrix consists of three components. U, autonomo matrix itself, on the left we transpose and again autonomo matrix. And y in rotation autonomo matrix. We constructed this because then, we can see U, transpose U cancels. It gives us an identity. And the transpose on the right matches what we obtain from a single value composition. Through this construction we see that U is made of U1, U2 and t and the rotation is made of this particular U times N or Y times V transposed which you obtain from the single value decomposition. And the last three elements give you the rotation matrix. Therefore, we can compute Y from this constraint. What we want is matching what we obtained from the single gravity composition. Which is U on the left side, V transformed on the right in the middle we have the special diagonal single value matrix. Which one, one on the diagonal is zero on the third elements. On the right we have this rotation matrix by 90 degrees. Followed by a Y matrix. So we want the two elements to be equal. What's in pink should be equal to each other. This allows us to compute Y, so there's only two solutions for Y. Illustrated here. Once Y is identified through this procedure, we can go back, reconstruct the rotation matrix. So as we've seen the rotation matrix is made of autonomal matrix U on the left, V transposed on the right and the y matrix in the middle. Since the y matrix could consist of two forms, two possibilities. We have two possible rotation matrix obtained through this procedure. As we recall, the translation vector itself had two degree of ambiguities. It could be the third columns of the U matrix, or it could be the minus of that vector. So together they form a four solution space for this problem. And overall we can think of taking single valid decomposition of the essential matrix allowed us to pull out the translation and the rotation. One thing to recall is the rotation matrix had a first property that the determine of rotation matrix is one. So if we go through our calculation constructing the autonomoal matrix U and V and multiply it by the y matrix in the middle with t rotation R. If we determine the determiners of R is negative one we need to invert the whole thing. When you just set t you could minus t and R equal to minus R to ensure rotation has a determinant equal to one. So this summarizes our procedure. Given two images, computer fundamental matrix between two views. Transform the fundamental matrix through the camera calibration through the essential matrix. And we want to decompose the essential matrix into it's translation vectors, as well as it's rotation matrix. We take single value of decomposition of E, reconstruct it's autonomal matrix, U and V transpose. And using those U and Vs we can reconstruct the rotation matrix as the U matrix times an unknown y matrix, y made of two possibilities and V transpose. And the translation vector has two possibility, plus U minus U. And we further refine the rotation matrix to ensure the determine of R equal to one. So this is our four possibilities. Four possibility illustrating a cameras centered relative to each other creating a positive direction or the negative directions. So we show here each of the camera configuration lead to different possible translation of the points. A point is 3D is projected image for each of the cases we see, they are all projected image. But sometime, they are projected to an image from the front. And sometime they project the image from behind. And this can not be disambiguated using homogeneous coordinates. Because any points in the homogeneous ray is equivalent to each other to a scale transformation. Including a minus sign in front of it. And the way that we're going to figure out which configuration is correct, is in fact physically triangulate them into the 3D space. And pick the one configuration of the camera, such that the triangulated points stay in front of the camera, rather than behind the camera. So this slide summarized overall procedure of recovering the translation vector T and rotation R from the essential matrix. What we will obtain is four possible solutions through this computation. In the following slide we will see how to pick one of them as the correct solution. And this requires understand point triangulation. Triangulation comes form the fact that we have 2D correspondence, we have two images or more than two images. And we have corresponding points marked in each of the scenes. We also assume that we know the camera route and the position of translation and rotation for each other. And then what we can do is reconstruct those two rays or multiple rays, and try to find the intersection of the rays into the 3D space, so too that we can localize that point in the 3D space. So triangulation requires multiple two dimensional measurements of the points. Requires the relative of camera rotation and translation within a different views. With that information we will estimate where that point is in 3D. And this requires solving set of lee squared problems. And we'll see how to set up the set of lee squared problems through the following constructions. Again we have our points x, y z in this space, in 3D that we want to compute. And that point is projecting the image through the camera projection equation. And lambda, indicating the rate distance from optical center to 3D world. The image of that point, X1, with 1 forms homogenous ray in the 3D space. That ray will meet the three dimensional points through the camera transformation P1 times X which is a three dimensional vector 3D space followed by 1. Taking a cross product on both sides by the vector X1 1, we obtain the following equation on the right. Any vector cross itself will equal 0. Therefore we have XY1 homogeneous corner of a point in the image times the projection, which is P1 times the vector XYZ followed by 1 in the 3D space equal to 0. This is a matrix form of representations, we see that whats in purple, is consists of three rows and four columns. And unknown vector X 1 is representative of four dimensional vector in this case. So we have a 3 by 4 matrix times 4 by 1 equal to 0 and this is for one point. We have another view with different camera rotation translation and therefore different projection, which is P. And they also has a different projection image X2. Once again, we can construct this 3 by 4 matrix consist X2 1 times P2, in purple, times the same unknown vector X 1, 4 by 1 vector, and that also equaled to 0. If you imagine tracing two rays in 3D space, these two rays would not intersect at all. What we're looking for is one point in the 3D space that is as close as possible to all the lines that's tracing into the space. We can do that for two views and we can do for multiple views as well. Anytime we have multiple views, we simply reconstruct this purple matrix. Take the image point location measurements forming a three dimensional vector, homogeneous coordinates times the camera projection matrix, P, together form again a 3 by 4 matrix. We simply stack those 3 by 4 matrix into this matrix, which we will call A. And they all multiply by X 1, the same three dimensional vector, your homogeneous coordinates of four dimensional space equal to 0. As we can see, this formed a simple least squares problem that we are familiar with. Where matrix A time X is 0 and we're looking for non-zero elements of X. Recall, even though we have a 3 by 4 matrix for every corresponding point, the rank of this matrix in fact is 2. And therefore, if it were more than two points, we will obtain the close solution to this. And if it have more than two points, we see the least squares solution through the SVD method. Let's demonstrate this point triangulation from a pair of image taken in the stereo configuration. Which means the two cameras are front and parallel to each other and they are merely translate the left and right. We are looking at one particular points in the scene and the second image of this point is simply move to the right, while maintaining it's vertical direction coordinates system. Again, we take the camera projection matrix, such that the world coordinate center is centered on the left. So, P equals K, times identity, with 0 attached to it. And the second camera is simply move to the right, so the translation between the two is 1 0 0. We formed the camera projection matrix P2, K 2 times identity with minus C attached to it. Now we had two camera projection equations. We'll have one point corresponding to each other in two views. We can compute them in 3D point associated with these two corresponding points in two different images tracing two rays intersecting in 3D. For this simple form you can in fact calculate by hand without using any linear algebra. We will show that linear algebra will give you the same solution. And here's the set of to do this. First you find the calibration matrix for the two different cameras, then we construct the projection matrix P given two points. We first write that in the homogenous coordinates, then we constructed by 3 by 4 matrix A. And from that we take a least squares solution for this matrix A times x is 0. And that requires to compute svd of A. We take the last column of v and that gives our solution in four dimensional space associated with a homogeneous coordinates of a point in 3D. And recall to convert this point into the physical space we need to divide out with the last element, the fourth dimension. And therefore, we obtain a point in x, y, z space for this triangulation. And here we demonstrate the same procedure applied to all the points in the space reconstructed three dimensional point clouds using this method. [BLANK AUDIO]. So here's an overall procedure that we have given the set of camera centers, given the set of camera rotation relative to each other. And if I would give you the corresponding point in a 2D image place, from the different views, I can triangulate that in 3D. And this is important if you were flying in space, we'd only care about how the airplanes is oriented to each other. Also care about the points in the 3D space where they're rigging close to some obstacle or not. We have four possible configurations of rotation and translations. And each one satisfy the constraint that if you trace the rays in 3D space, they intersect. But because the homogeneous coordinate system we use this ambiguity whether intersection happens in front of the camera or behind the camera. So now we have triangulated points, we can figure out why the triangulation lead to a points that is behind the camera or in front of the camera. And this is done by looking at the z coordinates relative to the first camera. We want to make sure the z coordinates of all the points in our field of view, in fact, has the value positive relative, the first person's camera view. And this will also narrow our search down to one of the camera configurations. So this slide shows the overall procedure of our methods. First, use the least squares method to triangulate the points for each of the four possible camera configurations. Then we check for each camera configurations whether the projected points are projected in front of of the camera in the first camera view. And then we take that as our final solution. I finally recall that the translation and rotation expressed in a camera projection system is different from the camera location itself in the world coordinate system. We need to take the transformation t, translation back to t, back to the camera location in a global world coordinate system. So the transformation of minus rotation transposed times t. And we need to remember to do that in order to find out the final location of the camera.