So, we're going to have a seven by seven matrix.

So this becomes our distance matrix.

We don't care about the diagonal because everything in the diagonal A is already

closest to A and remember each of

these individual points are already it's own cluster when we start.

So, now, we use our distance metric of our choice.

So, for example, we talked about in previous modules euclidean distance.

So, if let's say A is a two-dimensional vector,

so, if A has elements X1,

X2 and B has elements Y1 and Y2,

then the distance from A to B for example,

could equal the square root of X1 minus Y1 squared,

plus X2 minus Y2 squared.

So, we can do some euclidean distance,

we could do Manhattan distance,

we can do all sorts of things but essentially for

each of these pairs this would be the distance from A to B.

This is the distance from B to A.

So, we only really care about filling in the lower half of the diagonal here,

because the distance from A to B should be the same as the distance from B to A.

So, we compute the distance matrix between the point,

we're going to let each data point be a cluster, and then,

once I have all these filled in with numbers,

I'm going to find the smallest number in

there and I'm going to merge the two closest clusters.

So, let's say in this case that A and B were the two closest.

Once I merge those,

then I have to update the distance matrix and it reduces in size.

So, since A and B merge,

my new distance matrix is going to be

A-B-C-D-E-F-G by AB Through D-F-G,

and now the real trick becomes what's the distance between AB and C?

There's a whole lot of different methods for

calculating the distance between these clusters of multiple points,

and those are some of the things we're going to talk about in agglomerative clustering.

So, the key operation is this computation of distance between two clusters,

and it's quite simple when the clusters are individual points.

We use our distance algorithm of choice generally, like I said,

Euclidian, Manhattan, whatever sort of one you're interested in.

If you're not sure about those distance metrics,

please refer to the earlier modules where Dr. Cannon talked about those.

The different definitions of distance between clusters lead to

a whole bunch of different agglomerative clustering algorithms.

So, first, let's just talk about how we start this again Just to make sure we're clear.

So, in this example,

we've got a whole bunch of individual points,

and so we create our distance proximity matrix.

So remember the size of this matrix is going to depend on how many points we have,

so if we have 12 points,

this is going to be 12 by 12 matrix.

Then, we begin merging some of these points and as we merge these,

we begin having clusters.

So point one and point two might've been the closest,

then three and four and so forth.

As we merge these,

these clusters form and form and on every step

the distance proximity matrix becomes smaller and smaller,

until now we only have a one by one matrice left.

Basically, at each intermediate state,

we're going to merge the two closest clusters and update our distance matrix.

So, we're always going to be looking at the diagonal identifying which elements are the

closest and trying to group those together in our agglomerative clustering algorithm.

Now, the question though is after we merge,

how do we update the distance matrix?

The answer to this question results in

a variety of different agglomerative clustering algorithms.

The problem is each cluster now is a set of points.

We know how to define distance between two points,

we talked about this.

But if the cluster is a set of points,

so I've got ABC and a cluster,

and I've got DEF and a cluster,

what's the distance between these two?

There's lots of alternatives and it's not an easy task.

So as a preview,

what are some of the things that we can quickly think of?

What's the distance between this picture?

Well, I could say this cluster has a centroid and this has a centroid,

so let's just say the distance is the distance from the center.

Okay. But does that make sense?

Should it really be the closest distance here?

Or should it be the farthest distance out here?

So again, we can immediately start thinking

about a lot of different methods for doing this.

We could even say it could be the average distance between all points.

It could be A to D,

A to E, A to F,

B to D, B to E,

B to F, C to D, C to E,

C to F and take the average of all of those individual distances.

So we can quickly start seeing that there's a lot of

alternatives and it's not an easy task,

and what we're going to find is that different choices of these alternatives have

different pros and cons and can result in completely different types of clusters.

So perhaps, the first one to talk about is what we'll

call single-linkage agglomerative clustering.

So in single-link distance,

what we're doing when we merge clusters between Ci and j is

the minimum distance between any object in Ci and any object in Cj.

So this distance is defined by the two most similar objects.

So, let's say that we have five elements: ABCDE.