Hello and welcome back. This video, I want to talk about Gaussian mixture models and its connection to sparse modeling. And in particular, this is going to be useful for us to introduce the concept of structured sparse models. Let's get into that. As examples, we are going to use the same type of image processing tasks that we have been using for regular sparse modeling in the previous videos. What we have is that from an observation y, we want to recover the image f. Now, the image f has been deteriorated by this operator U and has additive Gaussian noise as we have seen before. What type of operators are we going to consider, are we going to have in mind? The same type that we have seen before. For example, U can be the masking. Basically, we let some pixels go through, some pixels we block. This is why, basically, our observation and we wanted to go back to f via inpainting. U can be a convolution, so we're going to blur the image. Once again, this is our y, this is the observation, and we want to go back to f, do deblurring. Or, we can do sub-sampling, basically, we want to go from a small image to a larger image and that's zooming. So, these are the type of operations that we are going to think and have in mind. Now, let us describe the type of models that we are going to have in mind. And the basic idea is that, for the moment, we are not thinking about sparse modelling for f. We actually are going to be thinking about Gaussian mixture models for f. Now, as we have been doing for sparse modelling, we don't work with the whole image, we work with patches. Once again, we work with all the overlapping patches, all possible patches in the image. And the idea is that we are going to mold it, each one of these patches, f sub i. Again, that has been deformed. Sometimes the operator is the same operator all across the image, sometimes it's not, and that's why I mark it U sub i and this is our observation. The noise is, as before, Gaussian. Now, we're going to model these signals with K different Gaussians. So, basically we have, if the patch is eight by eight. We are, we, again, have a 64-dimensional vector as before, and we are going to model with K Gaussians, K 64-dimensional Gaussians. For a Gaussian, we actually need to compute the average. For each one of the K, just to have a number in our head, we're going to make K = 10. We are going to see that we only needed a few Gaussians. So, we need to compute the average of this 64-dimensional vector and the covariance of this 64-dimensional vector. Each one of the Gaussians, we need to do that. Now, a Gaussian having a model as a Gaussian is equivalent to having a model as a PCA or Principal Component Analysis. The basic idea is that this covariance matrix, we can compute the item vectors and the item values and the item vectors give us a dictionary. 64 atoms in that dictionary if we are in this dimension. So each one of the Gaussians, let's say 10, gives us 64 atoms. We end up with a dictionary of 640 atoms in this case, but in a very particular form, because each block of 64 atoms are the eigenvectors that correspond to this, basically, this covariance matrix. So that's our structure right now. And the basic idea is that each one of the patches is modeled by one and only one of these. And what we have to do is, from the noisy image, we are going to have to estimate the noisy or the blur or the sub sample image. If we are talking about in painting, we are going to have to estimate the Gaussians, the 10 Gaussians, meaning, we are going to have to estimate their average, we are going to have to estimate their covariance. That's one thing that we have to estimate. We also, are going to have to estimate for each patch, which one of these K or 10 Gaussians fits that patch the best. And then, once we do that, we're going to have to obtain from the observation, reconstruct the actual image. So, a lot of things that we are going to have to do. Estimate the K Gaussians, estimate the identity of every patch, and also then estimate the reconstruction. It turns out, although, this looks as a very complicated problem, it turns out it is not. It's a very simple problem and it can be efficiently solved with what's called a maximum a posteriori expectation maximization. The basic idea is that, if we know the Gaussian, so, if we know every patch, the best Gaussian for it. We have to do nothing else but linear operations with these type of filters to do the reconstruction. So, it's piecewise linear for every Gaussian, for every Gaussian that we use to feed the particular patch, we just need a linear operator and, this is a very, very, simple algothorim. Once again, we are approximating basically every one of the patches, but by one Gaussian and it's kind of selecting from this very large dictionary, one block of 64 atoms which corresponds to the Gaussian that we are selecting. Now, the MAP-EM alternates between the estimation of the Gaussian and estimation of the, basically, the appropriate, the appropriate Gaussian for a given signal or for a given patch. So we are doing, again, we are doing these maximum a posteriori expectation maximization and that does more or less the following. Estimates all of the Gaussians, then basically says, okay, which patches like Gaussian number one more than any other Gaussian, and then, with all those patches, reestimates the Gaussian number one. And it's very similar to what we talk about the K-SVD, but instead of estimating the dictionary atoms, we basically estimate a whole Gaussian, meaning an average at covariance matrix or at PCA basis. And then we keep iterating, normally, we do that for three, four iterations until we converge. What we observe here is we have color coded this image according to the Gaussian that a given pixel is selecting. A pixel meaning the patch around it is selecting. This is the initialization, or after one or two iterations, and this is after that. And what we see is that similar regions of the image end up selecting the same Gaussian. Once again, we only allow, in this type of approach, to select a single Gaussian for every patch, meaning for every pixel. And again, we are going to do the recovery by averaging all the patches that touch the pixel as we have been doing for sparse modeling. So now, looks like we are not doing sparse modeling, looks like what we are doing is representing every patch as one out of K Gaussians. But let's see that what we are actually doing is a very particular case of sparse modeling. Why is that? In ordinary sparse modeling, we have a large dictionary, normally, overcomplete, and we are allow to select L atoms. So we have K here, a large K, and we are allowed to select L atoms. In this case, three. And remember, how many options do we have? We have L to choose from K, and that is normally a very large number. If we take the regular sizes that we are using for image processing by just which are 5x5 or 8x8, and then, a dictionary which is 256 or 512 or 1000, and in sparsity in the order of five to twelve five to ten, this number is about this order. It's a huge number, a very large number of subspaces that we can pick from that has the advantages, the advantage of being a very rich model, but on the other hand, makes the model very unstable. There is so much to choose from than basically if we make a small change in the image or in the patch, we might end up selecting something completely different. In, on the other hand, when we are talking about the Gaussian mixture model, which as we say, is equivalent, equivalent to a principal component analysis. So we take the covariance matrix and we decompose it to get the principal component analysis from the eigenvectors of the covariance matrix and then we concatenate them. So this is one Gaussian, another Gaussian, another Gaussian, another Gaussian, and so on. And we have K Gaussians, let's say, 10. Here are the eigenvectors corresponding to the covariance for the first Gaussian, the second, the third, the fourth, and so on. Here, I show that for five. Now, when you pick a Gaussian, you pick the entire block. And then you'd fit the best you can that Gaussian or you use the PCA basis as the atoms of your dictionary. But look what happened. We went from this huge number to only about 10 or 20 options, because once you pick the Gaussian, I mentioned it's in your operation, you just project into it and you already have the atoms. You already have everything. So in both cases, you learn the dictionary. But here, you learn a dictionary where every atom is independent and basically the coding with the atoms is independent. Here we learn a dictionary that has a block structure. And also when you are representing, when you are projecting, you pick entire blocks. And this is a concept that is called structure sparsity, where basically, the atoms have correlations and they're either picked together or they are not picked at all. This is a very particular case of what's called block structure. There is no overlapping, because you are only allowed to pick one of the Gaussians and there is no, basically in principle, there is no atom that belongs to more than one of the Gaussians if the Gaussians are different. So you only pick one block, which is a particular case of structure sparsity. The overall concept of structure sparsity, being again, the fact that atoms come and go together. They'll learn together, they are used to encode the signals together, and that stabilizes your system much more because the number of options is much lower. You either pick one and not the others or you pick the other, but you're not allowed to pick another from here, another from here, and another from here, you pick in blocks. Now, of course, as before, there is a lot of very interesting theory here, but it's also very important to know if this is actually useful. If I need 10 million Gaussians, I'm not gaining much. But actually, turns out that with 10 or 20 Gaussians, you can get basically some of the, some of the same results that we saw for ordinary sparse coding, but with a much simpler algorithm and a much more stable one. So it's, again, it's sparse coding, sparse modeling, but in a structure fashion. Let me show you a few examples of this. This is one of the examples that we have seen before, but now with, with this Gaussian mixture model or with this structure sparsity. Or basically, piecewise linear, because once we picked the Gaussian, we are in the linear area. Again, we have the original here, it's a zoom in to the original, only 20% available, and here is the right construction. We have seen a similar example with sparse coding. Now, we are seeing it with the Gaussian mixture models. Another example is zooming, we start from a very small image and we basically zoom in, we want to make it larger and we get really, really nice results with this technique, very sharp edges. So, this is another type of structure, another type of sparsity with structure incorporated and also we get a very strong connection with a classical model which is these Gaussian mixture models. With this, we are actually concluding the week on sparse models. We could spend a whole year teaching about sparse models, but we have provided the key concepts. What this part coding, sparse coding, we have provided the concept of how to learn the dictionaries, we have provided some of the theory. We have talked about some of the theory. We have talked about some of the computational techniques and we talked about the relationship between sparse modeling and more classical techniques like Gaussian mixture models. And from that, the new concept of structure sparsity. This, as I say, is a topic which is very active research currently in the image and signal processing community. And I hope you have enjoyed this week. And I'm looking forward to see you next week once again. Thank you very much.