So, in the previous module, we talked about what hierarchical clustering is and we mentioned that there's two main ways to go about taking data and doing a hierarchical data analysis. We said the most common way is agglomerative clustering. In this module, we're going to discuss how to apply agglomerative clustering to a data set to extract this hierarchical data structure, which we could then use to explore and visualize. So again, agglomerative clustering is perhaps the most popular hierarchical clustering technique, and remember agglomerative clustering is where you start with a bunch of data elements. So these are data elements. So you can imagine that these are baseball players like we had before or cars or movies or books, and each element A has some vector associated with a bunch of properties, and we're trying to decide how far away element A is from element B, from C, from D, and so forth. Essentially, what we're gonna do, is we're going to find the two closest elements out of all of these and the two closest ones are going to be the ones that are grouped together first, and then we're going to start merging these clusters to form a tree. So the basic algorithm is we're going to compute a distance matrix between the input data points. So that means in our tree, if we have 2-4-6-7 different elements, a distance matrix is going to look like this' A-B-C-D-E-F-G. So, we're going to have a seven by seven matrix. So this becomes our distance matrix. We don't care about the diagonal because everything in the diagonal A is already closest to A and remember each of these individual points are already it's own cluster when we start. So, now, we use our distance metric of our choice. So, for example, we talked about in previous modules euclidean distance. So, if let's say A is a two-dimensional vector, so, if A has elements X1, X2 and B has elements Y1 and Y2, then the distance from A to B for example, could equal the square root of X1 minus Y1 squared, plus X2 minus Y2 squared. So, we can do some euclidean distance, we could do Manhattan distance, we can do all sorts of things but essentially for each of these pairs this would be the distance from A to B. This is the distance from B to A. So, we only really care about filling in the lower half of the diagonal here, because the distance from A to B should be the same as the distance from B to A. So, we compute the distance matrix between the point, we're going to let each data point be a cluster, and then, once I have all these filled in with numbers, I'm going to find the smallest number in there and I'm going to merge the two closest clusters. So, let's say in this case that A and B were the two closest. Once I merge those, then I have to update the distance matrix and it reduces in size. So, since A and B merge, my new distance matrix is going to be A-B-C-D-E-F-G by AB Through D-F-G, and now the real trick becomes what's the distance between AB and C? There's a whole lot of different methods for calculating the distance between these clusters of multiple points, and those are some of the things we're going to talk about in agglomerative clustering. So, the key operation is this computation of distance between two clusters, and it's quite simple when the clusters are individual points. We use our distance algorithm of choice generally, like I said, Euclidian, Manhattan, whatever sort of one you're interested in. If you're not sure about those distance metrics, please refer to the earlier modules where Dr. Cannon talked about those. The different definitions of distance between clusters lead to a whole bunch of different agglomerative clustering algorithms. So, first, let's just talk about how we start this again Just to make sure we're clear. So, in this example, we've got a whole bunch of individual points, and so we create our distance proximity matrix. So remember the size of this matrix is going to depend on how many points we have, so if we have 12 points, this is going to be 12 by 12 matrix. Then, we begin merging some of these points and as we merge these, we begin having clusters. So point one and point two might've been the closest, then three and four and so forth. As we merge these, these clusters form and form and on every step the distance proximity matrix becomes smaller and smaller, until now we only have a one by one matrice left. Basically, at each intermediate state, we're going to merge the two closest clusters and update our distance matrix. So, we're always going to be looking at the diagonal identifying which elements are the closest and trying to group those together in our agglomerative clustering algorithm. Now, the question though is after we merge, how do we update the distance matrix? The answer to this question results in a variety of different agglomerative clustering algorithms. The problem is each cluster now is a set of points. We know how to define distance between two points, we talked about this. But if the cluster is a set of points, so I've got ABC and a cluster, and I've got DEF and a cluster, what's the distance between these two? There's lots of alternatives and it's not an easy task. So as a preview, what are some of the things that we can quickly think of? What's the distance between this picture? Well, I could say this cluster has a centroid and this has a centroid, so let's just say the distance is the distance from the center. Okay. But does that make sense? Should it really be the closest distance here? Or should it be the farthest distance out here? So again, we can immediately start thinking about a lot of different methods for doing this. We could even say it could be the average distance between all points. It could be A to D, A to E, A to F, B to D, B to E, B to F, C to D, C to E, C to F and take the average of all of those individual distances. So we can quickly start seeing that there's a lot of alternatives and it's not an easy task, and what we're going to find is that different choices of these alternatives have different pros and cons and can result in completely different types of clusters. So perhaps, the first one to talk about is what we'll call single-linkage agglomerative clustering. So in single-link distance, what we're doing when we merge clusters between Ci and j is the minimum distance between any object in Ci and any object in Cj. So this distance is defined by the two most similar objects. So, let's say that we have five elements: ABCDE. So we make our five by five distance matrix, okay? We know the distance on the diagonal because the distance from A to A, A to B et cetera is all zero. So, we only care about this element here. So, we can calculate the distance from A to B. So, lets say we've done some Euclidean distance algorithm sets two, let's say this is 3-1-4-7-10-11-12 and 5. So this is the distance between all the points. So, what we do is we find the smallest distance here, and it looks like C to B was the smallest. So that means in our tree, we've got C and we've got B. So these two are the closest, they get merged. So now we need to create our next distance matrix. So remember, our next step is we merged C and B. So, we've got A, we've got CB, we've got D, and we'd get E. Then we have A-B-C-D and E. So notice again before our distance matrix was five by five, now it's four by four, and again along the diagonal. So now, in single link, we have to decide the distance between CB and A, and this is defined as the minimum distance between any object in Ci and any object in Cj. So, the distance from CB to A, is the minimum of the distance AB and the distance AC because this is the minimum distance from A to any element in CB. So, we can look at our previous distance matrix A to B was two, A to C was three. So we can fill this in, this goes to the mean of two, three. So which one is smaller? Is two, so this becomes two. A to D didn't change, so it's four, A to E didn't change so it's 10. Let's skip this column for a second then D to E. We forgot to fill something in, so, let's just fill this in with 20, will make that really big. So, D to E was 20, so now let's think about C to B. So, we've got CB to D, so what's the distance here? It again becomes the minimum of all points. So, we've got C to D and B to D, and we can look in our distance matrix and this becomes mean of, so CD was five and BD was seven. So, we get five here, and the same for CB to E. So, C to E was 12 and B to E was 11, so this becomes 11. So then our smallest one is two, so A gets merged into this as well, and then we will continue on. So, our next distance matrix would be ACB, DE to ACB, DE. So now we've got our three by three distance matrix, and we continue filling this out and we continue merging until all the steps are complete. So in the next module, we're going to cover a couple of different distance metrics from single link, we will talk about complete-link and these things as well as show what some of the different results are by choosing these different distance metrics. But what happens with this is by getting this hierarchical structure, we can now start reasoning about why this might have occurred, what's going on in our data. We can start visualizing these hierarchical structures and again part of this is to help us build hypotheses, understand what's going on in the data and begin exploring how similar things might be in some high dimensional space. All right. Thank you