So we'll continue on bundle adjustment today. The question we're asking is where am I? Where is the camera located as we move in space? As what is the app in 3D? And we need to know that in order to fly through the space not into object. So we need to know where, as well as where the 3D objects are in the space. So those are color coded in orange here. X as in 3 dimensional XYZ, and R and t as in rotation matrix R, and t as in translation. What we are seeing is we have an image of the objects in 2D. And we have measured those objects in terms of positions in a two dimensional space denoted here as m tilde. And we also assume the camera calibration matrix K is given. So therefore, if you have the rotation matrix R and t, we can obtain the camera projection matrix P. And for convenience in this lecture, I'd would like to decompose the projection matrix further as R times identity matrix appended with -C. Which is the camera center of the observer in the world coordinate system. And imagine we have using the least square methods that we talked about in the past lectures. To obtain estimates of the camera position R and C as well as the three dimensional vector x in space. We can then take the measurements we have obtained using the least square methods. And try to project that points into the camera, using the camera projection matrix and we'll obtain a point called xy called m. And because we talked about it last time, least square is only concerned with the projection coordinate systems. And there's a error in it as well as noise due to the measurements in the data. This pack projection point m measured with respect to m total is not exactly the same. In referral to this lecture, we talk about how to minimize this error as measured in the image coordinate system not measured in a homogeneous coordinate system. But rather in an image coordinate system and we want to minimize this error, this m tilde minus m. And recall that m is in fact, derived from the homogeneous projection of the point x into the image through the camera projection system. And such we always obtain a measurement in terms of homogeneous corners of that point. And homogeneous corners of a point is three dimensional elements, denoting a ray going from the camera center through that point into the space. For every time we need to measure that point, we need to back project the point into the image through the division of u divided by w, v divided by w. And u, v, w, we call is obtained by the camera projection equation. P, a 3 by 4 matrix times x which is denoted in a homogeneous coordinates 4 by 1 vector. And within those, it needs to be adjusted here in rotation of camera center C as well as the x. So we want to minimize the projection error as shown here. The objective is to try and minimize the measurements in the image made of x tilde and y tilde. We want to minimize respect to the big x, those 3 dimensional vectors. The rotation matrix are in camera center C, all in the 3 dimensional space. And what's non linear about this equation is that we need to take the homogeneous coordinates u, v, w. And divide u and v by w to obtain an image projection in a two dimensional image plane. Further expanded out and denote more clearly, we're minimizing the camera center C, the points X in 3 dimensional space, the rotation R. And the quantity that we're minimizing are going through a transformation that's non linear. Because we're taking a division between the u elements divided by the w elements. Which is essentially the z direction, as well as the v divided by w, and this is a non linear. The R function itself is least square, but it's non linear. Together the form a non linear least square problem. Furthermore recall the rotation can be represented in many different forms. We have to be explicit about how rotation matrix R is represented. Rotation matrix is controlled by three degree freedoms. In a particularly exercise whether you go through it in this lecture, we'll focus the quaternion representation of the matrix R, says denoted by R of q. Q, the quaternion vector itself. To summarize what we have right now is a 90 square, where we have a vector of eight. In this case, 2 by 1 vectors made of x y, and f which is function of rotation matrix R, the camera center c as well as the point in 3Dx. And this is non linear relationship, because it is a division inside as well as the factor rotation matrix is in your non linear manifolds. And when I minimize this e minus f least square powers. A recall a solution to this non linear least square problems, essentially consist of two steps. The first step we take the error function, as we defined it impedance derivative functions back to the variable x and you set that to 0. And this is a necessary condition for us to minimize this non linear least square problem. Because the local maxima will have the gradient respect to no x equal to 0. The second step of this non linear least square problem requires taking the non linear function f itself and take a Taylor expansion of it. Essentially, standing a point where we're currently, have a gas of value of where x is. And Taylor expand, which is taking the Jacobean or gradient of the function at that point, and do a linear expansion of that. And this will allow us to figure out which direction to your further search into a known variable space, and how far do I go in that direction? So we're looking for a delta x, found where it currently r. And that delta x, need to figure out exactly which direction to go in the large dimensional space which is search as well as how far we need to go. And this delta x computers through this normal equation. As you recall, the normal equation essentially consists of two terms, one term is the Jacobian itself. A Jacobian is essentially the an amber n by n matrix, where for every unknown variable x, we have one column vectors. And for every constraint that we have, we have one row in this function, in the Jacobian matrix. So the number of rows correspond to the number of constraints we have. The number of columns associated with number of variables that we're searching over. And the other elements in this normal equation is simply the error function itself in a minus sign. So b-f of xs instead of fx-b. So this Jacobian together in a normal form equation provides us the solution for an iterary solution that we start at one point. We can follow this delta x until we find the local minimize to this function. So now recall our problem has this following f of X function which consists of u and w, and v and w divided by each other. And further we can expand the u, v, w obtained from the camera projection matrix to the camera calibration image K time the rotation matrix R times X- C. Where X is the vector in three dimensional space and C is the camera center in the world coordinate system. Further expand out the camera calibration matrix and rotation to obtain the following equations. And combine them, we obtain a form that consists of the camera calibration matrix times rotation matrix followed by X-C. What we want to do in the next few slides is then derive the Jacobian for this system. Once we have the Jacobian for this side of the projection equations, we can plug in the normal equation and compute the delta x in the search space. So we'll go through a sequence of slides which is numerically somewhat tedious. But the goal is trying to obtain the Jacobian matrix for our specific problems. So let's take one more step. We can take the u, v, w, which is homogeneous of coordinates of the array in image space in the following forms. Consists of the rotation matrix elements R and an unknown vector X, and this was a known camera center C in the following forms. Questions, what is Jacobian? So let's first take in Jacobians and divide it into different columns. As we mentioned, the columns of a Jacobian is associated with the unknown variables that we are searching over. Group this unknown variable in this case into three groups. The orange one that associated with the Cortina for the rotation matrix. That tells us how the cameras are intended. So for Cortina we call we have four dimensional vectors, we will have four columns in the orange. For C, which is the camera center, we need to know the camera centers in x, y, x. And this is a known variable, so then we have three columns for this part of a Jacobian. We also have the end known position for one points in 3D and that's denoted x in blue. Again, we have three columns, one for x, one for y, one for z. Then the constraints for the Jacobian however, from one point projected to the image plane is only two constraints. One for the little x, one for little y, rows and columns of the pixel space. We can see that we have this Jacobian made of two rows, and we have a total of ten columns. And grouping to the rotation matrix, camera center, and its 3D dimensional point. And this is denote exclusive numerical symbolic representation of the Durham functions. Respect to dq, respect to dc, respect to dx. Furthermore, we can take the Jacobian respect to the rotation matrix q, more explicitly, and this is computed through a chain rule. Requires taking first derivative function with respect to the rotation matrix elements, in this 3 by 3 matrix form. Followed by a derivative function, with respect to rotation matrix R, respect to the Cortina which is 4 by 1 vector. So what's in orange, in fact is decompose to two pieces. One is 2 by 9 elements. A Durham function respect to the rotation matrix itself, as well as a 9 by 4 matrix. Which is a Durham function rotation matrix, with respect to the [INAUDIBLE] representation. So we carry this colored Jacobian through in the next few slides. In the next few slides we're going to expand our Jacobians and write out as explicit forms. We might not need to remember these facts, this is just useful as reference material in case we need to do this. So the first part we can expand out of gradient are the Jacobian respect to the camera center, and this is again, 2 by 3 matrix. 2 by 3 matrix is written now in the following forms. Where w, u and v is estimated at the current point of the search space, followed by its Durham function with respect to the camera center c. And does explicit fund of derivation are written on the slides here. So what we need to do is plug in the value w, u, and v for the current estimate of 3D points projected in the image. And then calculates derivative function respect to C as shown on the right. Similarly, for a point in 3 dimensional space, we can also compute its Jacobian. Again as a 2 by 3 matrix, and again as a function of current values of u, v, w multiple by the gradient of delta u respect x, and delta v to respect x, and delta w respect x. Each one can be expanded in the four line forums. We don't need to remember this form itself, we simply need to use the as reference material to compute as Jacobian. The rotation matrix itself is somewhat tedious. Again we need to think about how to take the [INAUDIBLE] representation of vector four dimensions and represent a rotation matrix with it, which is shown on the top of the slides here. And then we need to take through the Jacobians which take out of the chain rules follows us requires us to take derivative functions as show here. And obtain the Jacobian matrix as we look at it here. Which consists of a total of a 2 by 4 matrix, shown in orange. So through the set of these slides, we have seen how we have computed Jacobian matrix for the current points in the estimate space. The current space consists of a guess of a camera center C, current guess of point x in the 3D space, as well as current guess of orientations. So one point will obtain this 2 rows by t10 n columns Jacobian matrix. And this Jacobiam matrix then can be used in a normal equation for us to estimate the delta x. How much I need to adjust the camera centers, the 3D points as well as rotation to obtain a better energy function in the non linear least square problems. So that's for one single point projected to one single view. If I have multiple views, because in the case shown here Bob, Mike. Two person look at the s 3D screen or in the case of a camera simply moving space and time, we obtain different view so the same 3 dimensional objects from different viewpoints. Our Jacobian matrix, we obtained or the least non linear least square problem we obtained are slightly different. And this can be updated in the form of Jacobian matrix form. Essentially what we have is a Bob's Jacobian with with Mike's Jacobian snd the attack on top of each other. If you then look at a single point in this 3D space, what we have is 2 by 10 matrix for Bob's Jacobian. And this can be spread out, those Jacobians had to do with Bob's camera orientation and camera center, followed by it's Jacobian with respect to 3D points. Because Bob's image corner system is not affected by how Mike is moving and that part of a Jacobian is always 0. Similarly we are going to add on Mike's Jacobian matrix. Mike's Jacobian matrix consists of initial Jacobian respect to the rotation and camera translations aa well as the 3 dimensional points in space. But it's not going to be affected by Box image measurements into x, y, z. So that also results a 0 Jacobian matrix. So the total matrix form for this particular situation have 1 points in a 3 dimensional space project to 2 camera view is a 4, 2 for Mike, 2 for Bob total of 4 rows. And we have total of Bob's Jacobian matrix of 7 columns. Followed by Mike's Jacobian matrix, 7 columns consists of his camera rotation translations plus 3 dimensional points. So a total of 7 plus 7 plus 3 elements, so that's the total Jacobian combination of 2 view, looking at a single point.