[MUSIC] We started out in this module with the aim of matching a catalog of radio galaxies with a catalog of optical galaxies to find out which were associated with the same physical object in space. We did this by implementing a very naive cross-matching program. But even after making some improvements, we found that our own solution was still too slow. We implemented our own cross-matching to demonstrate the issues. But since this is something astronomers have to do all the time, we can assume that someone else has already thought about this and come up with a better solution. And sure enough, they have. The astropy module has a cross-matching function that makes it extremely easy to calculate angular distances and cross-match two catalogs. So how fast will this run? When I did the same experiment as before, using astropy cross-matching, it took only 25 seconds for two input catalogs of a million sources each, wow. We've just gone from 24 days to 12 days, but then down to an amazing 25 seconds. What clever magic is astropy doing to achieve this? The answer is a data structure called the k-d tree. A k-d tree, or k-dimensional tree, is a way of representing the points in space in a recursive structure. K is the number of dimensions, which in our case are the two dimensions of our coordinate system, right ascension and declination. To construct a k-d tree, you have to recursively partition the space at the median point each time. Let's work through an example. The median point here in the x dimension is A. And so we split the plane at that point, and A becomes the root node of the tree. We then consider points to the left of A and split the plane in the y dimension. And again, at the median point, which is E. We repeat this process, alternating between the x and y dimensions, until the left-hand side of the tree is complete. Finally, the process is repeated for the data to the other side of A until every data point in our original dataset is either a node or a leaf in the tree. Now once the tree is constructed, you can use this for fast nearest neighbor searching. Say we have a galaxy at point T, and we want to find it's closest neighbor. We first calculate the distance of T from the root node A, which gives us an initial best match. Since T is smaller than A in the x dimension, we'll first search the left children A, starting at node E. E is much closer to T than A is, substantially reducing the best match radius, which now only intersects with four partitions. We now repeat this process, accept this time the value of T in the y dimension is larger than E. And so we consider B, the right child of E. B is even closer to T, reducing the best match radius even further. So now we only need to explore the two boxes in the top left corner. Finally, we visit child D, discovering it's outside the search radius and can be discarded. And that there are no other partitions to explore that intersect with our best match radius. The great thing about k-d trees is that we found T's nearest neighbor without having to consider any of the nodes marked in yellow. For large datasets, the savings are enormous. Not only is our program must faster, the time complexity is also improved. By changing the core algorithm, we've made a huge improvement to our running time and scalability. If you need to cross-match really large datasets, there are plenty of other issues you should consider. For example, maybe a database would be a better way of storing the data. This would avoid using many valuable CPU cycles reading in your input catalog each time. Another important issue is how you can evaluate whether your matches are just chance coincidences. Or whether the galaxies have a real, physical association. To answer this question, you may need a combination of astronomy problem solving and computational thinking. For example, if you can measure redshift for your objects, you can establish that they're at the same distance. And then it's much more likely they're physically associated. You could also do a statistical analysis to calculate the likelihood of a chance coincidence given the spatial density of objects in the two surveys. So now we've done our cross-matching successfully. What do the results actually mean? Nearly all of our radio sources have an optical counterpart, which means we can classify them into two different categories. Most of our radio galaxies are associated with quasars. Where we're looking towards the central black hole and can see the very energetic accretion disk. The radiation from the accretion disk is so bright that it outshines all of the stars in the galaxy. And therefore, looks just like a bright star, hence the name, quasi-stellar object, or quasar. The rest of our radio galaxies sit inside normal galaxies, where we can see a cloud of many stars grouped together. This could mean that the supermassive black hole has stopped accreting material. And the radio jets are remnants of past activity. Because we've found optical identifications for most of our radio sources, we can also get redshift for them. This tells us the distance to each galaxy or quasar. The redshift for the galaxies range from 0.02 to 0.5, whereas the redshift for the quasars ranges from 0.2 to 3. This tells us most of the galaxies are in the relatively local universe, whereas most of the quasars are typically much further away. By matching the optical and radio catalogs, we've been able to see the different types of galaxies that can host supermassive black holes and measure the distance to them. This is just one example of the additional science that can be done by combining information from different wavelengths. And illustrates why most of modern astronomy takes a multi-wavelength approach. [MUSIC]