In this session we're going to introduce the most popular clustering methods, the K-Means clustering method. So you may wonder, who first proposed this K-Means clustering method? Some people refer to MacQueen because he published a paper on the K-Means Clustering algorithm in 1967. But, some other people said, Lloyd actually internally, in a company, published the, the one in the internal report in 1957. Since the publication was internally published mo, most people may not even know it. So, he republished this one in a journal in 1982. That's the reason some people say Lloyd first got this algorithm. But no matter what, actually, the general common thing for this algorithm is, they consider every center is represented by the center of the cluster, or we can say, centroid. The K-Means Algorithm essentially can be outlined as below. It means, given K, the number of clusters, first we can select the K points as initial centroid. Of course, you can randomly select it, that's what our original K means, that, or you may have some small way to select it we'll introduce later. Then, we can put this one on repeat and here, look, there is once we select the K centroid, we can form K clusters by assigning each point to it's closest centroid. Once assigning this, we may need to recompute the centroid because the initial randomly selected centroid may not be a very good one. So we recompute the centroid that means the mean point of each cluster. Once you recompute the centroid, some object initially assigned to say centroid A now because of all the centroid changed this point may be even closer to centroid B. Okay? That means we need to go back to form K clusters by assigning or re-assigning each point with its closest centroid now. Okay? Then we need to recompute the new centroid. That's why we need to get into this repeat until loop, until some convergence criterion is met. There are different kinds of measures that can be used for calculating the distance and assigning the, to its closest centroid. We can use Manhattan distance, L1 norm, or Euclidean distance, L2 norm, but often, our initial algorithm was using this Euclidean distance. We can even use Cosine similarity. Now let's look example. How we can execute this K-Means Clustering Algorithm. Suppose we got the into 2-D space. Initially there were black points on the initial data point. Then we can based on the first line of the algorithm, we can select the K points as initial centroid. Now, K is two. We can select the two initial centroid. Once we select this initial centroid we can form K clusters by assigning each point to it's closest centroid. Now you probably can see we can assign these, these black points to either red centroid or to the blue centroid, you can see that's a current assignment. Once you assign these to form two clusters, okay, then we need to recompute the centroid, okay, where you see, you can recompute the centroid once you recompute this red centroid mode down here, okay? Then this, blue one actually is also moved, okay. Once we recompute the centroid, we may need to go back to reassign these point to its closest centroid. In this case you probably can see, the closest centroid, like this, blue points assigned to the red ones, and these red, this red one assigned to the blue side of the cluster. Okay? Then we can calculate the centroid again. Okay? So this repeat the loop here until it becomes stable or until the, the difference the different assignment becomes really, really small. Now the first important thing we should know is, this actually is a pretty efficient algorithm for clustering. Because if we're told we'll have n objects, we will want to find a K clusters. Suppose we get a t iteration it becomes stable. You probably can see the computational complexity is, every time every object that you need to see which cluster is it belongs to, so this time, you're assigning to the corresponding K centroid, so that's why it's n times K. In total, you have t iterations, that's why the computational complexity is big O of t times K times n. usually K and t, number of cluster and number of iterations, is far smaller than number of objects. That's why if you give me n, supposed quite big number of objects, this complexity essentially is a linear to the size of n. The number of objects, that's efficient algorithms. However, K-means clustering often may terminate at a local optimal. Then, the initialization becomes important if you want to find a high quality Clusters. We need to have a smart initialization or we need to randomly initialize this many times, try to find a good clustering green dots. Another important thing is how to specify K, the number of clusters. We need to specify this in a, advance for the K means algorithm. There are many ways to automatically determine the best K. There are some research papers, there are lots of efforts contribute to this. In practice you also can run a range of values as a K then select which one is the best. Then for K-Means messrs it's quite sensitive to the noise data and outliers. So there are variations like K-medians, or K-medoids algorithm we try to overcome this outlier noise data problem. Another thing is K-means actually works only to objects in the continuous. That means numerical data in the, in the, in dimensional space. How to handle clusters on the categorical data? Actually, people invented something called K-modes algorithms. Another important thing we should know is, because we use the sum of the square distance, it then, it is not a good idea to try to find a clusters with non-convex shapes or non-circular shapes. Okay. What you find is something closer to a ball. So we can use density based clustering or kernel K-means algorithm to do if we get into a non-convex shapes. We're going to introduce this algorithms later. So, because K-means came quite early and very popular used, there are lots of studies to work out the different variations of the K-means method. For example, how to choose initial centroid. There are methods called K-means++, Intelligent K-Means, or Genetic K-Means. We are going to introduce K-Means++ in the next discussion. Another aspect is how to choose different representative prototypes for the clusters. That means we may not use the mean point, then we may use K-Medoids or K-Medians or K-Modes. We're going to introduce this in the subsequent discussion. Another thing is how to apply feature transformation techniques. So, we may be able to cluster things even if they are not in convex shape. For example, they are massive developed card, weighted K-Means or Kernel K-Means. We are going to introduce the Kernel K-Means method [MUSIC]