I know you're wondering, what coin flipping has to do with this k-means clustering. I promise, I will explain in this section. And in the previous section, we learned that if hidden vector is known that we can derive parameters through but in reality, neither hidden vector or parameters are known and therefore it is unclear how to derive the basis of the coin. Lets start from a simple problem. Lets imagine for a second that parameters are actually known. Lets start from this example. The parameters are 0.35 for the blue coin, 0.80 for the green coin. Can we derive the hidden vector? Well, let's say what is the most likely coin to generate the first sequence of flip where the fraction of heads is 0.4. Well, we can compute the probability that the first sequence as generated by the blue coin. And this probability turned out to be 0.00113. We can also compute the probabilities that was generated by the green coin and it turned out to be a much smaller number 0.00003. Which coin is more likely? Of course the blue coin is more likely to generate the first sequence. Which coin is more likely to generate the second sequence where we have nine heads? Well, we do the same calculation. It turn out that the probability of the second series of flips to be generated by the blue coin is much smaller than the probability of it being generated by the green coin, and therefore, the most likely coin for the second series of flips is green. By continuing this way, we can derive the hiddenvector. It brings us to the following idea. Yes, we do not know Parameters or HiddenVector when we start evaluating the biases of coin. But let's start from randomly chosen parameters. As soon as parameters are chosen, we can derive the HiddenVector. As soon as the HiddenVector is derived, we question our wisdom in selecting the initial parameters and estimate the parameters using data and hidden vector as I described. Before through the [INAUDIBLE] product. In this way we will arrive to the new parameters. As soon as the new parameters are derived, we once again evaluate the hidden vector and iterate. We came up with an approximation algorithm for evaluating parameters for the coin flip. And let me show you how this algorithm work. We start from randomly selected parameters. As soon as parameters selected, we do form a HiddenVector, so we assign each series of flips to either blue coin or red coin. As soon as HiddenVector is derived, we can use this HiddenVector to derive new parameters. Soon as this new parameter is derived, we construct new HiddenVector and we iterate until convergence. Does this algorithm remind you of something? Of course, it is the k-means algorithm or something very similar to the Lloyd algorithm in disguise, whereas data, hidden vector and parameters in k-means clustering. Well, data is a set of datapoints. Well, what are the parameters? Parameters are of course, the positions of the centers. And what is the HiddenVector? HiddenVector is simply assignment of data points to k centers, where each is an (n-dimensional vector with coordinates from 1 to k). This assignment HiddenVector is shown below by green coordinates. So we figure out how to solve the coin flipping problem and what relationship it has to k-means clustering. And now we'll talk about a more general approach called Expectation Maximization.