In this lecture, you're going to learn how to compute and estimate of the Gaussian mixture model parameters from observed data. We will first try to follow what we did before to get the maximum likelihood estimate for single Gaussians. But as I will show, we are going to fail to obtain nice analytics solution. Instead, I will introduce a new way to get a locally optimal solution, using an iterative algorithm called Expectation Maximization or EM for short. While a single Gaussian has only two parameters, mu and sigma, GMM has multiple mu and sigma, plus weights, and the number of Gaussian components, denoted here as K. As I mentioned in the previous lecture, we will use uniform weights 1 over k and focus on understanding how to estimate the mean and covariance matrix parameters. Let's begin to find the maximum likelihood estimate of GMM parameters, as we did for single Gaussians. To briefly review, maximum likelihood estimation means we want to find the parameters of the model that is most likely to produce the observed data. As before, instead of maximizing the joint probability, we can assume independence of all observations and maximize the product of each probability. Next, we take the log and maximize the sum of the log of probabilities, instead of the products of probabilities. So far, we get the same results. But when we apply the specific probability model of the GMM into the equation, which is a sum of Gaussians, we have this. It turns out that we cannot further simplify this formula analytically, because there appears a summation of Gaussians inside the log function. This implies we can estimate the parameters only via iterative computations. And any solution found might not be globally optimal but, don't be too disappointed. We are going to learn the EM Algorithm shortly, and the algorithm gives a reasonable solution under certain conditions. The way we are going to learn the EM Algorithm is first, see how we compute the GMM parameters as a special case. And then move to the general ideas of EM for interested advanced learners. Having said that, let us first talk about EM for computing GMM parameters specifically. We need two additional things, an initial guess of mu and sigma and a new variable Z that we are going to call a latent variable. It might sound strange that we need a guess to solve the parameters, which we want to estimate. However, in this class of complicated problems called non-convex optimization, there exists many suboptimal solutions, which are called local minimum. And the initial guess affects the solution found. While the initial guess could be important to this problem, it will not be the focus of this lecture. So now let's turn to the new variable, Z. The latent variable Z of the ith points with respect to the kth Gaussian is defined as the relative ratio of the kth Gaussian density at that point. Essentially, Z indicates the probability that the ith point is generated from the kth Gaussian components. Let's look at a 1D example with K equals 2. We have this point Xi. And for the moment, we have some prior of two Gaussians G1 and G2. If the value of G1 of Xi is P1 and the value of G2 of Xi is P2. And the value of Zi1 is computed as P1 over P1+P2, and Zi2 is computed as P2 over P1+P2. In this particular case, P1 is larger than P2. So it is more probably that Xi is generated by G1 than G2. Now given that the latent variable values for all data points and all the Gaussian elements, we can redefine the mean and covariance matrices weighted by Z. If the probability that a point belongs to the kth Gaussian is small, then the points contributes less for computing the parameters of that particular kth Gaussian. Putting everything we have discussed so far together, you can compute the parameters and the latent variables iteratively until the values converge. First, we initialize the parameters. Then use the initial value of mu and sigma to compute Z, the latent variables. Once Z is updated, now we fix them and update mu and sigma. You will alternate between these two steps until the change of parameters becomes very small. We are not going to prove it here, but it can be shown that this iteration converges to a local optimum. I have introduced a use of the EM Algorithm for parameter estimation of GMMs, which work through iterations. We have seen that with an initial setting of the mean and covariance and by introducing a latent variable, we've surprisingly turned the problem tractable in a locally optimal sense. In the next lecture, I will introduce the EM Algorithm more generally.