Hello, and welcome back. It's time now to present examples of sparse modeling in image processing. Before we do that, we have to do one more thing. Now, remember what we have. We have a dictionary of K elements, K atoms, and every atom has N dimensions. Now, what are basically the, basically the signals that we are going to process? And that's what we need to discuss just for one minute. And the basic idea is very simple. We're going to be talking about images. This is going to be our data, the noisy image. The whole image. This is what we are looking for. Now, we are not going to work on entire images, we are going to work on patches. And this is the r. The r is just a basic, a simple binary matrix that extracts the patch around ij. So, if we have an image, for every pixel, we construct a patch around it. Or, we could actually consider ij the upper left corner. Anything that is good as a coordinate system, and we extract patches. For example, eight by eight patches as we have been talking. Now, it's very important that we do not take blocks as we did on JPEG, in JPEG. We actually take all the patches that are fully overlapping. So, we have an image, and we take a patch, and we take the patch next to it and the patch next to it. So, for every pixel, let's say the upper left corner, we construct the eight by eight patch. So, this is a binary matrix that extracts that patch. And it's the patch that we want to have a sparse code for it with a learned dictionary. So here is the sparsity. So, we're not working on the whole image at once, we're working on all the overlapping patches of the image. And, what we have is as we discussed before, the model is for the individual patches. We're going to construct a dictionary that works, for example, on all the overlapping patches of a given image at the same time. And note here, we are doing basically the optimization over the signal, this is the whole image. We are looking for the sparse code, this is the second part. And as we're going to see in a second, we're also going to learn the dictionary. We could learn the dictionary offline or we could learn it and adopt it for the image as we're going to explain. Once again, this is just nothing than a binary measure that extracts the patch. So think like you have the image, you pick a patch, you do sparse coding. You pick the next patch, you do sparse coding. Now, what about then the dictionary? The first option is to train on a large database. So, you do that offline you train on a large database. The second option, and these are not contradictory as we're going to see in a second, is to use the image itself. You travel all around the patches of the image, that gives you a lot of training data already. And the basic idea is that normally we combine both of them. That's one possibility. So, you learn a dictionary with a bunch of images off the shelf offline, and that you use as your initialization of the dictionary for K-SVD, and then you get the new image. Even though it's noisy, you adapt the dictionary to the image. So, you run a few iterations of this dictionary adaptation, K-SVD, sparse coding, dictionary adaptation, sparse coding for the particular image. That's one option. The other is to forget completely about this and start with any visualization and only run it here. So, there is many options and, depending on the application, which one you can pick. Now, if you are in a rush and you don't have computational time to do the de-noising, you don't have to update the dictionary. Basically, you train it with a database. Let's say, that you want to do faces, you train it with a database of faces and you use that dictionary. If you have computational time to do that, it's always better to basically adapt the dictionary to the particular image. And we have tons of data for that because we can use all the patches for doing that. So, once again, this is our general formulation but now we have D either because we are starting from scratch for the particular image or because we are adopting the pre-learned dictionary to the current image. We have to, this is our data. And we have to, basically reconstruct the image, we have to compute the code, we have to compute the common dictionary, one dictionary for all the patches in the image. We are not allowed to use a dictionary per patch, that would be basically ridiculous. So, we have three things to optimize for and the basic idea is very simple. You fix two, you optimize for the third one. For example, you fix, you ignore x and basically you assume that x is fixed. So you ignore this term, and you fix D, and you do sparse coding. That counts the first part of K-SVD. Then, you fix the code, and you update the dictionary. And this is basically an iteration that's basically K-SVD. So, you have the image, you go over the patches. Every patch becomes the columns of the large x matrix that we saw in the previous video. And then you do K-SVD for all of them, okay? So you are encoding with sparse coding every patch. When you're finished, you update the dictionary, you encode again, you update the dictionary. When you are done with the dictionary, all that's left is to compute x. So, you basically don't touch the code anymore, don't touch the dictionary anymore. And you have to basically solve for this problem when basically D and alpha are already fixed. Of course, this is gone. x only shows up here and shows up here, and is very easy to show that the result is basically the weighted average of the patches. So what happened is that every pixel is touched by multiple patches. So let me just draw that here, for example. We have an image, and we have a pixel. Now, there are multiple patches that touch that pixel. For example, this one that has the picture kind of the, pixel, kind of the, on the bottom left corner. There is also this one. basically there is also let me just use the different color, there is for example, a patch here. There is how many? If we have eight by eight, we have 64 patches that are touching that pixel, unless the pixel is at the border of the image. Let's ignore those for a second. And basically, for everyone of those patches, we have sparse coding with the dictionary that we have learned from all the patches in the image. So, that sparse coding, basically for every single one of these patches is that we can reconstruct it as the alpha with the optimal alpha that we have compute for every single of the patches ij. Every single one has been reconstructed. How do we reconstruct the pixel? We basically average all those patches. You can add weights to the average. What's basically what the theory tells you. If it's, the pixel is in the center of the patch, then that patch is more important than if the pixel is in the corner of the patch, but that's just a technicality. You can basically just do the reconstruction of every patch with the sparse code and the dictionary that we have learned and just average those 64 if you're using an eight by eight block. So now, this is all what you do. Every patch sparse coding and K-SVD to learn the dictionary. So, I think we are ready now to see examples. Let's start from a image in gray values, no color. This is the source image, and this is a noisy image. We have add, we have added Gaussian noise with these variants. Now, we can initialize the dictionary with random patches from this image. We don't have this image anymore. This is just for reference. This is all what we have. We can initialize with patches from here. We can initialize with a dictionary that we have learned offline, or we can initialize with something that we know is good, Discrete Cosine Transform Dictionary. That's the one used in JPEG. So we know it's reasonable a good algorithm. So a, a good dictionary. In this case, we have initialized with that. And basically, we update using all the patches here all the possible patches. We learn the dictionary, then we encode every part with the learned dictionary using sparse coding. Then, we average all the patches that touch a given pixel. And voila, this is the dictionary that we have learned. Very interesting how it has changed from the [UNKNOWN] dictionary. And here is the restored image where we basically have reduced the noise by two things, one projecting every patch to the space defined by these dictionary using sparse coding, and then averaging the purges. And here you have a result. This was the original, this is what's available to us, and this is the de-noise data. Very important that the adaptation of the dictionary is down with the noisy data and still the results are really, really good. And I want you, once again, to observe the type of dictionary that we have learned from the image. For example, it has learned that texture that we have here, the different types of texture, are learned in the dictionary. It was basically adapted to the image. Now, how do we deal with color? There are numerous ways of dealing with color, but basically one of the simplest ways is to, instead of doing eight by eight patches, we do eight by eight by three. We could have done every color independently, as we have done for example, for in painting or even for compression. But we can also take eight by eight by three patches. You know, we concatinate all the colors and we do K-SVD sparse modeling in exactly the same fashion. Let's see some results. So, again, we always show the original just for reference because you never have it. What you have is this image, this is noisy image. A noisy version of this. And then, you run K-SVD on this eight by eight by three or five by five by three, depending on the patch size that you decide to use, and you get a very nice result of image denoising. And here are just the, basically, the standard measurements of, of p, K's and R in DB's to measure just the quality of the result. Here is another example. We start from again the reference. This is only for reference, the original image. We have noises. Look how bad it is. This is really bad. We have added a lot of noise and we get a beautiful reconstruction from this sparse modelling technique. Really, really nice reconstruction with a lot of details and a relatively high quantitative measurement. Although, what's most important is the qualitative quality of the reconstruction. So, we went from here to here. And once again, we learn the dictionary. In this case, we initialize a dictionary with an off the shelf pre-learned dictionary, and then we adapt it to this noisy image for a few iterations of K-SVD. Now, we can also do inpainting with this. Inpainting is a particular example of noise where the noise is kind of so strong that it makes some pixels disappear. And, there's many ways of defining the types of challenges of inpainting. One of them is to take the image and to just drop 80% of the pixels. This is related to compressed sensing, and we're going to talk about that at the end of this week in just one video. we basically drop 80% of the pixels. You say, you're gone. So that's basically represented here with dark values zero. So it's like, the noise is so strong that you, basically there's nothing there. And then you run basically sparse modelling and K-SVD starting with this image, this is the missing part. We start from this image and here is the reconstruction. Once again, a very, very interesting result. We only had 20% of the pixels and we managed to do a nice reconstruction. Again, we go from here to here. This is just for reference, this image. Let's just see that in color. Again, 80% missing and here is the reconstruction. Once again, I think a very, very nice result. Again, for this example, we started with a dictionary that was learned from a database of natural images, then we adapt it to this image for a few iterations. And then, we do this by reconstructing every patch and then averaging for every pixel all the patches that touch it. Again, I think a very impressive result showing the power of this technique. This is another example, instead of dropping the pixels, we do the same that we did when we were talking about inpainting in general. We write on top of the image, and then we do the reconstruction. Again, very good results with nothing else than doing sparse modeling, learning a dictionary from this noise image and being able to reconstruct this image. Now, we can also do video inpainting. We can drop 80% of the pixels. And now, again, we have a number of options. We can do every frame by itself, or we can do patches in kind of three dimensions, X, Y and T. We discussed about that when we were talking about video inpainting in the previous week. Basically, we take patches, let's say, we take three frames together, five frames together so the patches have also some temporal information. After we have decided what type of patches to use, we run sparse modeling with absolutely the same technique that we have been describing. And I'm going to run a few of them. The original is as always, only for reference. It actually is not available. We always go from here to here doing the dictionary learning. Let me run a few frames. This is one frame, this is another frame. Look always at the quality of the result, the reconstruction. Another one. Another one. So once again, we basically walk through the different frames going from this to this. And you can always look at the reconstruction is very, very close to the original that is not available to us. Just one more frame here. Now, I want to do an, yet another frame. I want to finish with demosaicing. This would be the last example. It's kind of a particular case of image inpainting as we have seen before. Let me explain what's the problem here. Most digital cameras do not collect full red, green, and blue. They basically do a trick which is, one pixel, they collect ren, red. The next, they collect green, the next, red, the next green, and so on. For the next role, they collect green, blue, green, blue, green, blue, and then they repeat that. So, at every pixel, they have only one color. Red, or green, or blue. And what you have to do is you have to interpret. In order to show me the image, you have to, at every pixel, create a red degree and a blue. This is kind of an interpretation or an inpainting of the colors. But, we can treat this as nothing esle than inpainting of the colors where basically we have just, instead of dropping the entire pixel, we drop two of the colors of the pixel per pixel. Sometimes we draw up the red and the green. Sometimes we draw up the green and the blue. Sometimes we draw up the red and the blue. So imagine that you have a queue which is n by n by three with sometimes is, you have it available and sometimes you don't. Now, we have seen that sparse modeling is extremely successfully even when the entire pixel is gone. Even when 80% of the complete pixels are gone. So, let's run it on this. And this is called demosaicing. It has to be done basically in all consumer digital cameras. Very expensive digital cameras actually collect the full red, the full green, and the full blue. But, normal consumer cameras do this type of mosaic and then the production of the full color image is called demosaicing. And here you see examples of color images that have been reconstructed from patterns like this using sparse modeling. It's a particular case of basically inpainting in the color domain. It's even easier than that, because instead of dropping entire 80% of the pixels, you just drop some of the colors per pixel, and the results are really, really nice results. We have, basically, some zooming, and you can see that the results are very, very sharp. And these are really some of the best results produced with, for demosaicing. They're obtained with sparse modeling techniques. So, these are just some of the examples of sparse modeling. Sparse modeling in image processing is being used a lot these days, also for image classification and other image processing challenges I have shown you for denoising, inpainting, demosaicing. And I think that this basically illustrates the importance of the theory that has a lot of mathematical components. And basically also very, very cool applications. We have a couple of additional things to learn this week about sparse modeling and also about compressed sensing, and the connections between the two, and I am going to do that in the forth coming videos. Thank you very much, see you later.