Hello and welcome.
In this video,
we'll be covering Hierarchical Clustering.
So, let's get started.
Let's look at this chart.
An international team of scientists led by UCLA biologists used
this dendrogram to report genetic data from more than 900 dogs from 85 breeds,
and more than 200 wild gray wolves worldwide,
including populations from North America,
Europe, the Middle East and East Asia.
They used molecular genetic techniques to analyze more than 48,000 genetic markers.
This diagram shows hierarchical clustering of
these animals based on the similarity in their genetic data.
Hierarchical clustering algorithms build a hierarchy of
clusters where each node is a cluster consisting of the clusters of its daughter nodes.
Strategies for hierarchical clustering generally fall into
two types, divisive and agglomerative.
Divisive is top down,
so you start with all observations in a large cluster and break it
down into smaller pieces Think about divisive as dividing the cluster.
Agglomerative is the opposite of divisive.
So it is bottom up,
where each observation starts in its own cluster and
pairs of clusters are merged together as they move up the hierarchy.
Agglomeration means to amass or collect things,
which is exactly what this does with the cluster.
The agglomerative approach is more popular among
data scientists and so it is the main subject of this video.
Let's look at a sample of agglomerative clustering.
This method builds the hierarchy from
the individual elements by progressively merging clusters.
In our example, let's say we want to cluster
six cities in Canada based on their distances from one another.
They are: Toronto, Ottawa,
Vancouver, Montreal, Winnipeg and Edmonton.
We construct a distance matrix at this stage,
where the numbers in the row i column j is the distance between the i and j cities.
In fact, this table shows the distances between each pair of cities.
The algorithm is started by assigning each city to its own cluster.
So if we have six cities,
we have six clusters each containing just one city.
Let's note each city by showing the first two characters of its name.
The first step is to determine which cities,
let's call them clusters from now on,
to merge into a cluster.
Usually, we want to take the two closest clusters according to the chosen distance.
Looking at the distance matrix,
Montreal and Ottawa are the closest clusters so we make a cluster out of them.
Please notice that we just use a simple one-dimensional distance feature here,
but our object can be multidimensional and distance measurement can either be Euclidean,
Pearson, average distance or many others depending on data type and domain knowledge.
Anyhow, we have to merge these two closest cities in the distance matrix as well.
So, rows and columns are merged as the cluster is constructed.
As you can see in the distance matrix,
rows and columns related to Montreal and
Ottawa cities are merged as the cluster is constructed.
Then, the distances from all cities to this new merged cluster get updated. But how?
For example, how do we calculate the distance
from Winnipeg to the Ottawa/Montreal cluster?
Well, there are different approaches but let's assume for example,
we just select the distance from the center of the Ottawa/Montreal cluster to Winnipeg.
Updating the distance matrix,
we now have one less cluster.
Next, we look for the closest clusters once again.
In this case, Ottawa,
Montreal and Toronto are the closest ones which creates another cluster.
In the next step, the closest distances between
the Vancouver cluster and the Edmonton cluster.
Forming a new cluster,
the data in the matrix table gets updated.
Essentially, the rows and columns are merged
as the clusters are merged and the distance updated.
This is a common way to implement this type of
clustering and has the benefit of caching distances between clusters.
In the same way,
agglomerative algorithm proceeds by merging clusters,
and we repeat it until all clusters are merged and the tree becomes completed.
It means, until all cities are clustered into a single cluster of size six.
Hierarchical clustering is typically visualized as a dendrogram as shown on this slide.
Each merge is represented by a horizontal line.
The y-coordinate of the horizontal line is the similarity of
the two clusters that were merged where cities are viewed as singleton clusters.
By moving up from the bottom layer to the top node,
a dendrogram allows us to reconstruct the history
of merges that resulted in the depicted clustering.
Essentially, hierarchical clustering does not require a prespecified number of clusters.
However, in some applications,
we want a partition of disjoint clusters just as in flat clustering.
In those cases, the hierarchy needs to be cut at some point.
For example here, cutting in a specific level of similarity,
we create three clusters of similar cities.
This concludes this video. Thanks for watching.