Let's first start with state estimation. Again, the goal here is to be able to obtain reliable estimates of the position and the velocity as the vehicle moves through a three dimensional environment. So in the lab, we have motion capture cameras that allow the robot to measure its position. What essentially happens is through the reflective markers that are mounted on the robot, the cameras can estimate the position of each reflective marker and these cameras can do it at split second timing exceeding speeds of 100 times a second to 200 times a second. Also, these measurements are incredibly precise. The accuracies can be well below one millimeter. Of course, this only happens in a laboratory setting. What happens when you go outdoors? Well, in larger vehicles, like this X-47B made by Northrop Grumman which is capable of landing on a ship deck autonomously, the vehicle uses GPS and other kinds of communication to determine where it is, relevant to the ship deck, and is able to perform autonomously. If you go outdoors, in general you might not have GPS, or your estimates of position from GPS can be very inaccurate. This is especially true next in extra tall buildings. Certainly when you go indoors, it’s hard to get GPS. And we’d like to be able to operate both indoors and outdoors. There’s no GPS and because these vehicles are small and maneuverable, they can find themselves in settings where it’s very difficult to communicate directly with the vehicle. We also wanted these vehicles to move very quickly and manuever through these complex environments. How do we navigate without GPS, without external motion capture cameras or any other kinds of external sensors? Well, imagine that the vehicle is equipped with sensors such as cameras or color plus depth cameras as you see on the bottom, or laser range finders as you see on the top right. These sensors allow the vehicles to infer information about the environment, and from this information allow it to localize itself. How does that work? Well, let's look at this cartoon. Imagine you have a robot and imagine there are three pillars in the environment. And let's imagine it has range sensors that allow it to detect these obstacles or the pillars. And imagine these range sensors give you estimates of where these pillars are, d1, d2 and d3. Now let's assume that the robot has something like an inertial measurement unit that allows it to estimate its movement as it goes from one position to another position. So you have some estimate of delta x, and when it gets to this new position this range finder estimates the positions of the pillars that it had measured previously. Except now these range estimates, d1 prime, d2 prime, and d3 prime, are different from the original depth estimates which are d1, d2, and d3. So, the question we wanna ask ourselves, is it possible for the robot to concurrently estimate the locations of the pillars and the displacement delta x. So if you think about it, you're trying to estimate these eight variables. Three pairs of x y coordinates for the pillars, and delta x. This problem is referred to as Simultaneous Localization And Mapping, or simply SLAM. And the idea here is that you're trying to localize yourself, in other words you're trying to estimate delta x, while mapping the pillars, x1, y1, x2, y2, and x3, y3. In this video, you will see a video of a robot entering a building that it hasn't seen before. It uses the SLAM methodology to map the three dimensional building, while estimating its location relative to the features in the building. The map that's building, you can see in the central screen. The blue colors are the ground floor and the red colors are the top floor. You will see there are intermediate points where the vehicle actually plans its trajectory. If you look at the red snaking curve that emanates from the vehicle to gold coins. These gold coins have been designated by an operator that's viewing this map and tasking the vehicle. And the vehicle does everything else. In other words, you can click and point your way through this building without entering the building, while getting information about what's inside the building. In our lab we've built many different types of vehicles. Here are four examples. On the top left you see a vehicle that's powered by lasers, a set of cameras. It has a GPS unit as well as an Inertial Measurement Unit. On the bottom left you see another vehicle that's only powered by two cameras and an Inertial Measurement Unit. On the top right, a smartphone drives the robot. On the bottom right, the vehicle is actually instrumented with a RGBD camera. Red, green, blue, and depth camera, which you can now get as part of an Xbox video entertainment system. It also has on it a laser scanner. You can see that each vehicle has a different mass and a different size. And the reason for that is very simple. As you put more hardware on the vehicle in terms of sensors and processors, the vehicle has to become bigger. And to support this weight you have to have bigger motors, and to support those motor you have to have bigger props, which in turn requires bigger batteries. Here's another vehicle we use for instruction in our classroom, and this vehicle just has a single camera and an IMU. Because it only has a single camera, we actually instrument the room with beacons. These are AprilTags. And as you'll see in this video, the vehicle is able to localize itself with respect to these beacons, and hover. So it's estimating its position and velocity relative to the beacon and hovering autonomously. You can switch between small markers and large markers which then allow the vehicle to control its height. And, of course, if you have lots of these markers, then you can actually navigate over larger distances. So in our laboratory this is an inexpensive replacement to the motion camera system. Here it's a single camera which is off the shelf, it's inexpensive, but we do instrument the environment with beacons. And because these beacons, which are the AprilTags you see on the carpet, are known to the robot, It's able to recognize them and estimate its position and orientation relative to the tags, and therefore the environment.