[MUSIC] In the last video, we talked about the basic operating principles of LIDAR, one of the most popular sensor choices for self-driving cars. In the next two videos we'll learn how we can use the point cloud generated by LIDAR sensors to do state estimation for our self-driving car. By the end of this video, you'll be able to describe the basic point cloud data structure used to store LIDAR scans. Describe common spatial operations on point clouds such as rotation and scaling. And use the method of least squares to fit a plane to a point cloud in order to detect the road or other surfaces. To start us off, recall that a 3D LIDAR sensor returns measurements of range, elevation angle, and azimuth angle for every point that it scans. We know how to convert these spherical coordinates into Cartesian x, y, zed coordinates using the inverse sensor model s o we can build up a large point cloud using all the measurements from a LIDAR scan. For some LIDAR setups, it's not uncommon for these point clouds to contain upwards of a million points or more. So what can we do these massive point clouds? Let's consider an example of a point cloud we might encounter in the real world. Let's say our LIDAR scans a nearby tree off on the side of the road, and produces a point cloud that looks like this. We only see points on the part of the tree that's facing us because the tree and the leaves reflect infrared light. The first question you might ask is how do we keep track of all of these points? What kinds of data structures should we use to work with them? One common solution is to assign an index to each of the points, say point 1 through point n, and store the x, y, zed coordinates of each point as a 3 by 1 column vector. From there, you could think about storing each of these vectors in a list, or you could stack them side by side into a matrix that we'll call big P. Doing it this way make it easier to work with the standard linear algebra libraries, like the Python NumPy library, which lets us take advantage of fast matrix operations rather than iterating over a list and treating each vector independently. So what kind of operations are we talking about? There are three basic spatial operations that are important for carrying out state estimation with point clouds. Translation, rotation, and scaling. We'll talk about each of these in turn. When we think about spatial operations on point clouds, our intuition might be to think in terms of physically manipulating the point cloud while our reference frame stays fixed. But for state estimation, it's more useful to think about things the other way around. Objects in the world mostly stay put while the reference frame attached to the vehicle moves and observes the world from different perspectives. So let's think about how translating our reference frame, say, by driving for a ten meters will affect our perception of a single point in point cloud. We can start by drawing the vector from the origin of our sensor frame, S, to a point, P. Now, consider a second frame, S-prime, whose origin has been translated relative to S due to motion of the vehicle. Note that the basis vectors of frame S-prime are the same as the basis vectors of frame S. Only the origin has moved. We can draw another vector from the origin of S-prime to the point P. Immediately, we notice the resulting vector, indicated here, is just the tip to tail sum of the other two vectors. And these vectors are just geometric objects until we express them in a coordinate system. And what we're after are the coordinates of the point P in frame S-prime. We can get these easily by just subtracting the frame-to-frame translation vector from the coordinates of P in frame S. This extends easily to a batch operation on the full point cloud by simply tiling the frame-to-frame translation in a big matrix R, and subtracting it from the point cloud matrix. Depending on the language or linear algebra library you're using, you probably won't need to build this R matrix explicitly. In Python, for example, the NumPy library is smart enough to repeat the frame-to-frame translation implicitly using broadcasting semantics. Now, let's think about what happens if rotate our reference frame instead of translating it. Again, keep in mind that we're not changing the physical point P, only our view of it. So in this case, we only have to think about one vector from the origin of frame S to P. What does change in this case is actually the set of basis vectors we use to express the coordinates of the vector S to P. Remember that the rotation matrix C tells us how to find the coordinates of a vector in a rotated frame from the coordinates of the vector in the original frame. So if we know the rotation matrix from frame S to frame S-prime, all we have to do is multiply it against the coordinates of P in frame S to get the coordinates of P in frame S-prime. To determine the coordinates of the entire rotated point cloud, the operation is exactly the same, thanks to the properties of matrix multiplication. The last spatial operation to think about is scaling, which works very similarly to rotation. But instead of changing the direction of the basis vectors in our coordinate system, we're changing their lengths. Mathematically, this just means pre-multiplying the coordinates of each point by a diagonal matrix S whose non-zero elements are simply the desired scaling factors along each dimension. Often but not always these scaling factors are the same, and the matrix multiplication is equivalent to multiplying by a scaler. In these cases, we say that the scaling is isotropic or equal in every direction. We can use the same matrix multiplication for individual points or for the entire point cloud, just like we did for rotations. Usually, the transformations we're interested in are a combination of translation and rotation and sometimes scaling. For example, we're often interested in estimating the translation and rotation that best aligns to point clouds so that we can estimate the motion of our self-driving car. We'll talk about how we can do this in the next video. Fortunately for us, it's easy to combine all three operations into a single equation By first translating each vector, then rotating into the new frame, and finally applying any scaling. And of course, this operation extends to the batch case as well. So we've seen how to apply basic spatial operations to point clouds. And we'll see in the next video how we can use these concepts to do state estimation for self-driving cars. But before we get there, there's one more important operation to discuss, and that's plane fitting. One of the most common and important applications of plane-fitting for self-driving cars is figuring out where the road surface is and predicting where it's going to be as the car continues driving. If you think back to your high school geometry classes, you might remember the equation of a plane in 3D. Zed equals a plus bx plus cy. This equation tells you how the height of the plane z changes as you move around in the x and y directions. And it depends on three parameters, a, b, and c, which tells you the slope of the plane in each direction and where the zed axis intersects the plane. So in our case, we have a bunch of measurements of x, y and zed from our LIDAR point cloud, and we want to find values for the parameters a, b, and c that give us the plane of best fit through these points. To do this, we're going to reach back to Module One for our tool of choice, least-squares estimation. We'll start by defining a measurement error e for each point in the point cloud. And e is just going to be the difference between the predicted value of our dependent variable zed-hat and the actual observed value of zed. We get zed-hat simply by plugging our current guess for the parameters a-hat, b-hat, and c-hat, and the actual values of x and y in. In this case, the error, e, that we are considering, is for a bumpy road surface, for example. That is, a surface which is not exactly planar. For the moment, we're ignoring the actual errors in the LIDAR measurements themselves, which also have an effect. We can stack all of these error terms into matrix form so we have a big matrix of coefficients called a. Multiplied by our parameter vector x, minus our stack measurements b. You can work out the matrix multiplication yourself to see that we get back the same measurement error equations we started out with. Now, all we have to do is minimize the square of this error and we'll have our solution. This works exactly the same way as the resistor problem we worked through in module one. We can start by multiplying out the square to get a matrix polynomial in the parameter vector x. From there, we take the partial derivative of the squared error function with respect to the parameter vector x and set it to 0 to find the minimum. This gives us the linear system we'll need to solve to obtain the final least squares estimate. We can solve this linear system using an efficient numerical solver like Python NumPy's solve function. Or just use the pseudo inverse to get our final answer for the plane parameters. One important thing to notice here is that we did not account for sensor noise in our x, y, zed measurements. All we've done is to find the plane of best fit through a set of points. It's certainly possible to set this problem up in a more sophisticated way that does account for sensor noise. You could use a batch approach similar to what we just discussed, or you could even think about including the road parameters in the column filter to estimate them on the fly as the sensor data comes in. The best solution for your self-driving application will depend on how much you trust your LIDAR data and how much thought you want to give to uncertainty in the road surface. Now, although all of the operations we've described here can be easily implemented with NumPy or any other linear algebra library, there is a fantastic open source tool called the Point Cloud Library, or PCL, that provides all sorts of useful functions for working with point clouds. In fact, it's so useful that you'll find it everywhere in industry. The core library is built with C++, but there are unofficial Python bindings available as well. If you want to learn more about the features PCL, I highly recommend visiting pointclouds.org and having a look around. So to recap, we've seen that point clouds are a way of capturing all of the measurements from a LIDAR scan. And they are often stored as a big matrix. We saw how we can use linear algebra to do useful operations on point clouds, like translating, rotating, and scaling. And we also saw how we can use the least squares algorithm to fit a 3D plane to a point cloud to find the road surface. The Point Cloud Library, or PCL, implements a bunch of useful tools for working with point clouds in C++. One of the most useful algorithms in PCL is called the iterative closest point algorithm, or ICP, which is a common method for estimating the motion of a self-driving car using two LIDAR point clouds. We'll talk about how ICP works in the next video. [MUSIC]