[MUSIC] Unsupervised learning deals with the discovery of patterns derived from the feature matrix itself. Clustering analysis is one of the main subject areas of unsupervised learning, and it will be the focus of this lesson. In short, it is the family of methods that are used to partition observations, sometimes probabilistically. Into groups such that the groupings minimize pairwise dissimilarity, or they represent inherent patterns. In addition to clustering algorithms, we will also demonstrate model comparison. It is expected that you are familiar with many of the clustering methods mentioned here. There are other types of clustering like fuzzy clustering and techniques that are specific to categorical data. But we can only cover a few. Here, we will demonstrate at least one example from each category. This is the function we used to simulate the four datasets used as examples here. We simulate data for this topic, because not all clustering algorithms are created equal. And when it comes to understanding their strengths and weaknesses. Visualizing clustering assignments can go a long way toward that goal. Clustering methods, like spectral clustering, were developed with the first two subplots in mind. Where Gaussian mixture models and their variants have been developed for a long time with plots like the last two in mind. To evaluate our cluster assignments, we will use the silhouette score. It produces results that are similar to other available methods, but they are scaled -1 to 1, which is very intuitive. We will see in the coming examples that silhouette scores and other similar methods are useful. But they are not optimized for datasets that fall outside of what is traditionally thought of as a cluster. The k-means clustering algorithm creates cluster assignments by trying to separate sample into k-groups of equal variance. Minimizing a criteria known as inertia or the within-cluster sum-of-squares. This is a measure that makes sense for k-means, but it has a few drawbacks. That makes it not the best option for model selection in general. It does not meet the criteria for being a valid metric, being the most significant drawback. The k-means algorithm is very fast and has a large number of use cases. But as we can see it does not work well for all of these datasets. Considering how simple the algorithm is, though, it performs rather well. Hierarchical clustering does not require choosing the number of clusters to use before obtaining results. Putting off this choice is possible because the algorithm takes a bottom-up approach of successively fusing clusters together. Until all of the data are joined together. Like k-means, hierarchical clustering commonly uses Euclidean distances to group together data. Though other measures may be used. The first step of the algorithm involves identifying the two points that are separated by the smallest distance. And grouping them together as a cluster. This pair of points is now treated as a single unit. The structure is then built on from there by adding more observations into the hierarchy. Under this parameterization, shown in the code above, it does well for plots A and B, but has a harder time with C and D. These results can be visualized intuitively with a dendrogram which can be useful for EDA. Spectral clustering represents another way of grouping the data together. There are several different methods that fall under the heading of spectral clustering. But the canonical version involves applying k-means to the data after it has been projected into another space using a Laplacian matrix. The first steps treat the data as an undirected graph of connected points and represents this graph as an adjacency matrix. This means that the spectral clustering algorithm here can be readily applied to graphs. Spectral clustering does well with the plots on the right. You have probably noticed that we're estimating the number of clusters in the run functions themselves. Let's see what happens when we actually give it the number of clusters. We see that the algorithm does fairly well with all four plots. Spectral clustering is an exceptionally flexible algorithm. Here we use a Bayesian version of a Gaussian mixture model, or GMM. GMMs are a family of methods that use probabilistic assignments to create clusters. They are very flexible, it is often said that any distribution can be fit if you add enough Gaussians. Notice that the GMM used here employs a predict function, unlike the clustering algorithms. Also, it is not necessary to scale the data with a GMM. We see that GMMs do well with the sub-plots on the right-hand side, as expected. There are variants of GMMs that do not require an explicit setting for the number of clusters k. Because they use a Dirichlet process prior that helps determine k from the data. These are the results when we use the version of the clustering functions that estimate k. The first thing to notice here is that hierarchical clustering performed perfectly for A, but had a lower score. The numbers for datasets C and D play out as expected. But for the first two plots, the shape of the clusters does not work as we might expect for the silhouette value metric. The results are the same for other metrics as well that we tried. Estimating k is a normal part of the workflow. However, if we assume that we knew each k for all of the clusterings, and we ran all of the results, this table is what we would get. An adjusted Index would be more appropriate here since we know the clustering assignments. But it is still interesting to know how well the silhouette value explains the known clusterings, since we do not normally have labels. Spectral and hierarchical are perfect in the datasets A and B, but the silhouette values do not reflect this. For datasets C and D, the evaluation metric performs as expected. Overall, this experiment shows that there is no one clustering method that outperforms all the rest. And that caution is required when relying on the use of evaluation metrics. [MUSIC]