[SOUND] In this session, we are going to introduce another kind of measures for clustering validation called the Relative Measures. We already introduced external measure when we have [INAUDIBLE]. We also introduced internal measure when we try to measure the intro cluster, they are very compact, and inter cluster, they are far apart. Now, we want to look at what is relative measure. Relative measure actually is trying to direct compare different clusters. Especially for those obtained using the same algorithm but different parameter setting. For example, like if we all use k-means to cluster a group of things but in that case, we want to measure for different cluster instantiation. Which one gave you the best clustering or a different case for k from one to certain number, which k gives you the best clustering. In that case, we were measured as using relative measure. And, one interesting measure called silhouette coefficient can be used for relative measure, okay? But, actually, silhouette coefficient also popularly used as internal measure. That means we will be able to use it to check cluster cohesion and separation. So, I first introduce it as an internal measure before I introduce it as an external measure. So for every point, x sub i, we can calculate its silhouette coefficient, s sub i, okay? How we calculate it? We use this formula this formula actually, mu sub n, is the mean distance to the point of its own cluster. Simply says, if we find this point in the blue cluster, okay? We're going to see this x sub i, actually to all the points in it's own cluster that distance we want to reach them. We get a mean distance that's mu sub n. Then what is mu sub out's mean, okay? That means we want to see this blue point to its closest cluster which is this orange one, to its closest cluster. All the points its closest cluster, we calculate their mean distance, that's this value, mu sub out mean, okay? Then what we want to see is their distance, you probably can clearly see for any point x sub i. You want that the internal distance, its rhythm is small but even to its closest cluster, this average reason be big. So, to that extent, this value usually should be positive and the more positive the better, okay? Then we, we normalize it, we divide by the maximum value between this mu out mean. And the mu n, and if we pick up the two, so it simply says that this division must be smaller than one. But on the other hand, if this cluster to one, simply says, it's better because the internal is very compact to the outside is the reasonably big. This is only for one point, what is for the whole clustering? Okay, so silhouette coefficient for the whole clustering is you take all the end points. You calculate every one's silhouette coefficient s sub i. I from 1 to n you calculate all the s sub i, you normalize it, you divide it by the number of points n. You get the whole clusterings silhouette coefficient. Of course, these silhouette coefficient, if it's closer to plus 1, it, it implies good clustering. Because the internal one to its own points is rather close to the other cluster, even to the closest one, is really far. So we also can use this silhouette coefficient as a relative measure. For example, we can use it to estimate the number of clusters in the data, say, what is the best of k. We can pick a value that yield the best at clustering, that means yielding the high value for SC and SC sub i, okay? For example, you can, thinking about, you did a lot of clustering, okay? For all the i's, you want to see which one give you the best clustering? That means which one gives you the highest silhouette coefficient? Then you can pick up that instantiation, pick that k. So that's the usage of silhouette coefficient as a relative measure. [MUSIC]