Hello, and welcome back. We are now ready for last fundamental step in Sparse Modeling that this dictionary learning. Let's go into that. Remember what we are trying to solve. We are solving this equation, and this is the dictionary. This is the data, and we already talked a lot about alpha, the Sparse Representation of the data using the dictionary. So what should the dictionary be? The assumption in image processing, the assumption is that images are behave in a sparse fashion. And we have already seen examples of that using, for example, JPEG or even these exaggerated example of putting all the images as part of the dictionary. So how do we design a dictionary? The call is if we believe in sparsity we should decide a dictionary that encourages sparsity, the more sparse, the vector alpha. The better we are going to be because we are trying to create sparse models. So the idea is that we are going to have to learn, or we are going to select the dictionary that helps us to sparsify the signal. And there are basically two directions in doing that. One is to take. Off the shelf dictionaries. There are very good dictionaries out there. We have seen JPEG is an outstanding algorithm. So we could use the cosine as a dictionary. We could use Fourier, we can use Wavelength. There are many, many off the shelf dictionaries. The issue with those dictionaries at. They're not adapted to the signal. So they will be, kind of, universal. They will be very good for the signals that we are going to work with, but maybe they're not, kind of, the best possible. So the other direction is basically to learn the dictionary. Let's start with a lot of examples of data. And let us learn the dictionary. And then we can for example, use that dictionary for other images. And that's what we're going to discuss now. We're going to be in the dictionary learning regime and not in this regime which is the most classical signal processing. And we were here when we were doing. Image compression and we describe JPEG. But now it's our turn to describe this. To present a dictionary that will help us to sparsify the signal. So we are going to design. D. And the basic idea is very, very simple in concept, and also not very difficult to implement, as we're going to see. So now, instead of having one signal, we have multiple signals. Each one of them is a column here. So each column here is for example an image patch which is 64-dimensional, eight by eight. We have one dictionary that we are going to train for all these images. And then every one of these images, patches, or signals have, has its own Sparse Representation. So for example the first one has this one. The second one has the next one. So we have moved from a single column here and a single column here to a matrix. What's the dimensions of these matrix. It's N. So that's, again, the size that we had before. And the number of signals, P. What is D? Again, we have N and K. So, N here and K in destination. There is nothing new here. And here, we have K in this direction And p in this direction, one per signal. And the goal now is to design this dictionary. What do we want? We want basically a Sparse Representation for all the signals. So we are basically summing over p all the sigmas. This is P. Okay? And we have one dictionary for all the signals and we want a Sparse Representation for each one so, see that J runs from one to P, a unique dictionary and all of them have to be good representations. And they have to be sparse at the same time. Before that, we didn't have this summation and we were just talking about one. Now we have a summation and we want a dictionary that is good for all of them. Now, let me just make one comment. I'm going to describe how to do that when we have already picked signals and we are going to learn the dictionary. Now, as the number of signals increases, we can do a similar type of training of learning that I am going to show you next online. Online means. That we are basically. going to learn and adapt the dictionary as the images are coming. So we don't have to have a huge memory to save all the images. We're going to do that as the signals are coming. I'm not going to describe that during this week. The algorithms. The implementations that I mentioned that you could download from the web. Do that in an online fashion so they can deal with millions zillions of images with absolutely no problem. So, now my goal is to show you how we do this learning. And look at here. We have to optimize, not only for the sparse code as we were doing before, we're going to optimize also for the dictionary. So we're going to learn the dictionary, and the Sparse Representation of the signals, at the same time. And the basic idea. There are a number of techniques. I'm going to explain you what's called the K-SVD Algorithm that is kind of an extension of K means for clustering. The basic idea is very, very simple to illustrate. We have all the signals. I wrote them as x here. And we're going to learn a dictionary. So we initialize the dictionary, any way you want. One way of initializing is just picking, randomly, some of these data points. So you pick a few data points and you put as columns here. Remember that I mentioned, here is k. We're going to learn K atoms. So you just. Here, there are P signals, and P is larger than K. Otherwise, as we talked before, just put the signals in the dictionary. So just randomly pick, here, K of the signals and put them as dictionary. That gives us a, an initialization. Now, the next step is we're going to basically do Sparse Coding with that dictionary. That we already know how to do. So we basically go every signal and we solve as Sparse Coding problem. This is going to be the code for the first signal, the code for the second signal. Again, red means active, is part of the active set, and non-zero co-efficient. So we did the sparse coding. We went all around and we basically for each one of these, of them did Sparse Coding. It's one of these blocks I'm going to explain a bit more in the next slide. I'm just giving you the general idea. The next part is we're going to have to activate the dictionary. And the basic idea is we go in the other direction. For doing this Sparse Coding study, we went into this direction we sparsely code every signal. In the dictionary, we're going to to update one item at a time. Using the codes that we have already produced. So we are going to update one atom at a time. Now after we have updated, we are going to let the signals to be encoded again and reiterate, so we. Start with a dictionary, we do Sparse Coding. We update the dictionary, we do Sparse Coding again, we update the dictionary and either we converge or we do that for a prefixed number of interrations depending on our computational capabilities. Now, what I have to do is explain each one of these blocks, and you're going to see how simple they are. This we already know, basically. This is Sparse Coding. Sparse Coding is very, very simple because we have discussed a number of times. And look like I put here, P. Because either we do zero with basically matching pursuit or orthogonal matching pursuit, or we do the relaxation, P equal one. So given the dictionary, we do the Sparse Coding. We are going to go in this direction for every signal we get the Sparse Code. Okay? So we go in that direction. This is just one example. And for every signal, we have to basically solve a Sparse Coding problem for that signal, and absolutely nothing else. And that's, again, is here. So we don't have the sum. Because we do one at, at time. So, we don't have to compute the whole matrix A, this is the whole matrix A. We don't compute it all at once, we just go one at a time as we have been doing before and that basically solves us, the whole matrix A. So, now we have this Sparse Code with this dictionary. Now it's time to update the dictionary, which is the new part. And again, we'll solve this by any Pursuit algorithm as I mentioned before. Either the L0 with Matching Pursuit, or L1 with any of the convex solvers. So now it's time to do the dictionary. Let us pick one of the adults. As I say we're going to update one atom at a time. How do we update this atom? The concept is very simple before I run you through. We're going to basically pick all the signals that have used that atom. This signal is using it. This signal is using it, this signal is using, and so on. So we basically are going to say, which one are the signals have nonzero entry in the alpha vector corresponding to that atom. And we're going to make that atom even better for those syncs. So, basically we go with peak one. And we look at all the signals that are using it. Okay? Just that are using that atom. We have one here, another here, another here, another here, and so on. Remember, this is our data. And this is our matrix, this Sparse Code. So we take all of them, and we forget about everybody else. We say, you signals didn't use this atom, so I'm not going to pay attention to you at this moment. I'm going to pay attention to you later on. Remember, we iterated this. And now the goal is, as I say, we're going to make this atom even better. Remember please, this atom has been used by these signals. But that's not the only atom that those signals use. So for example, the first one uses this and this. The second one uses this and this. So, I'm keeping all the signals that have used that atom but they have also used other atoms but I, I'm not going to touch that. The next thing I do is actually I remove the influence of dots. Let me just explain that once again. For example the first signal was a combination of this atom, and this atom. So I go and such drug the contribution of this atom. The same for the second signal. I go and subtract the contribution of this atom. The same for the third, the same for the fourth. So, for every signal, I basically go and subtract the contribution of all the other atoms, but the one that I'm trying to change. And then I ge, I have an error. Because I'm not considering, now, this atom. I'm subtracting the contribution of everybody else. And I just keep. That contribution of this atom, and that basically gives us kind of an error. Okay. And that's the error we're going to try to redesign the atom to minimize that error, that error. What's the contribution of this atom when we use it for those signals, lets see if we can change the atom to make that contribution even better, meaning that error even smaller. And so what we have here is this the error, this is the residual, this is what was the contribution of this atom. And now I want to change it, I want to redesign it and also I want to redesign the coefficient. So, this I have. This is the amount of information energy that this atom was contributing. And now I'm going to basically. Try to redesign it to make it even better. So this is what we have to optimize for and this is very easy. It's what's called a singular value decomposition. It's a standard tool in linear algebra. It has a clause formula. So, once again, we pick an atom, we can go one atom at a time or we can go randomly, we pick an atom, we pick all the signals that have used that atom. We subtract the contribution of every other atom. And then we say, how can I improve this atom in such a way that it's contribution to the signal is even better? And that's an SVD. So, we run an SVD for every single of the atoms. We have updated the dictionary. And now, as we explained in one of the previous slides, we let, again, all these thing has to be encoded again with a new dictionary. Then we update the dictionary and we run a few iterations. So very simple, Sparse Coding, SBD. Sparse Coding, SBD. That's all what we have to do. Multiple of those. As many Sparse Codes as signals. As many SVDs are, as atoms in the dictionary. We do that, we have a new dictionary. So, what we have so far is, we started from the need of doing signal modeling. That basically led us to propose Sparse Modeling as a technique. We had to discuss some theory. Why is this possible? And then we had to discuss, how do we do the Sparse Coding, and how do we train the dictionary? And that was the topic of the last two presentations. Now is, will this work? Let us try with real images. And let's see if this works. Let me just pay attention to one point before we go into that in order to close this video about dictionary learning. Some people have basically. Decided that instead of training the dictionary, as we have just said, we're going to make the dictionary be a subset. Of X, of the data itself. Now I mention to you that we initialize the dictionary very often by randomly sampling some of these syncs. But you could actually say instead of randomly sample why then I pick the best one, the one's that. Everybody else is a Sparse Representation of, of those ones. Those selective ones that I have picked. And there are also optimization techniques to do that. So that's an alternative. It's still dictionary learning. But we'll restrict the dictionary to be one. Of the signals. Basically, we'll restrict every atom of the dictionary to be one of the signals. That's an alternative. It's just a different way of basically learning a dictionary in order to get the sparsest possible representation of our signals. So now it's time to show you some examples. And we're going to do that in the next video. Thank you very much.