Hello and welcome back. One of the reasons that image and video segmentation is studied so much in image processing is because of its relevance to multiple applications from the movie industry to medical imaging as we are going to see in the last week of this class. The other reason there are so many studies in this area is because image and video processing is intrinsically a very difficult problem. And, what we are going to do in this video lecture, is basically introduce the concept of interactive image segmentation. We are going to let the user help us to do the image segmentation. So that's basically the topic of, of this lecture, interactive image segmentation. Let me show you a couple of examples. Extreme examples of the difficulty of image segmentation. This is a really nice image. If you haven't seen it before, it looks nothing strange to you. It's just basically a drawing of a forest. Now if you look very carefully, and I invite you to do that, you start seeing faces in this image, faces that are camouflaged. For example, we see a face here. I hope you can see it - their eyes, nose and mouth. Here, is in our face, here, once the eyes, the nose, and the mouth. There is outer phase camouflage here. Eyes, nose, mouth. There's a profile face here, the eyes, the nose and the mouth. There is actually a lot of images. If you stare at this image for some time, you're going to start discover more and more faces that pop-out from this image. Now this is a really challenging image segmentation problem to segment out the faces. It becomes easier if I tell the algorithm, more or less the location of those faces so then the algorithm might be able to find the eyes in those locations. Let's see yet another example. You're welcome to pause the video and observe at this image for longer if you wish to discover more and more faces. Let's look at the next example. Another very interesting drawing. We see a tree. Here, with the houses in the back, we see a lady. Now if you stare at this image, and in particular if you take a bit of distance from the screen. And sometimes if you kind of close your eyes to blur a bit the image you might actually find that there is a face here. The lady becomes the nose. Here, part of the buildings basically become the eyes. So once again, how can we tell the image segmentation algorithm to segment this very, very difficult image? In particular, how can we tell the algorithm if we're interested in the tree, are we interested in the face, are we interested in the nose or the lady? What is foreground? What's background? To illustrate what's foreground and what's background, that is a very, very difficult problem, let's just see this famous image. We actually see here. Two profiles, one, two. I hope you can also see if you look at the wide region, we see actually a glass. So what is for as forgram, the profile, or the class, it's very hard for the, algorithm in the computer to understand, what is that we're interesting, what's interesting in the image for us. So one way of, doing that is, let's the user. Tell us what's foreground, what's background. Let us use the user, basically, which is the real client of this and the one that it is interested in the segmentation, let us use the user to give us a bit of information about the image. And the way to do that. That we're going to explore is basically we're going to ask the user. To mark very simple scribbles. The user is going to come and just scribble. Here's a story line, but just can be any mark in the image, as we're going to see later, and say, you know, I want to call this background. Another student said, I want to call this foregram. Now go and segment it. So that's all what we are going to ask from the user at this stage. We might ask more, later on. Once of course, we segment it out we can do different things, like putting new images in the background. But it's very important that we are going to get help from the user. We shouldn't be asking a lot of help. We shouldn't be asking the user to go around and basically completely mark the boundary of the object. Then there's not much left for the computer to do and that's very tedious, which is still done sometimes, but it's very tedious work for the user. We want very little interaction that can help us a lot. How can we exploit these scribbles? So I'm going to illustrate that. I'm going to tell you a particular pipeline to exploit these. But the underlying ideas are very common in multiple, multiple techniques for image segmentation that you can find in the literature. So the user has given us hints of what's foreground, that we see here. And what's background? We can basically go and look at all the pixels that the user has marked as foreground. We can draw the histogram. We can draw the distribution of those pixels. We can do basically the same for the background, so we end up with that: Distribution of foreground, we end up with the distribution of background. Very often the background has much more variability than the foreground. Now, if you now give me a pixel. I can, compute what's a probability of being in the foreground. What's a probability of being in the background. From these, distributions. That the user mark. Once again, all what I'm using are the pixels that the user marked as foreground. And the pixels that a user marked as background. We can actually normalize this. So here, we use C to stand for the vector in color space. It can be RGB, red, green, blue. It can be other color space. It doesn't matter. It's whatever are the colors, just under the scribbles that the user has marked. And again, we have these distributions. And we can predict the probability of a pixel being foreground or background, which is just the probability of being foreground and we normalize by both foreground or background. So, for example, a pixel that was marked by the user as being foreground. It would have a very high probability, that pixel value. basically, once again, this is just a particular illustration. The basic idea is that we use the distribution of the foreground and the distribution of the background to tell the computer. You know, pixels that are similar to the colors that the user has marked as foreground, you should try to call them foreground. Pixels that are similar to the color of what the user has marked as background, you should try to call them background. So, now we have kind of an idea of what to call foreground, what to call background. And. If we just take, basically, this formula and we draw that. We are going to get this. Which is already a very interesting segmentation. Just by looking this exercise and looking at this image. We can almost find the object of interest. Basically, what the user told us is the object of interest, which is the cat. Now there's something interesting to observe here. So, the higher the value here, the more probability of being foreground, based on what the user told us. So why do you think that these eyes are basically very dark, meaning have very high probability of being background and very low probability of being foreground, and that's because they're very dark, and look, also here we have very dark pixels, so if we just look at the pixels that the user has marked. That's not enough information for us. Because they might be inside the object of interest some pixels that look as the background. So we need to do a bit more. What can we do? Let us see that next. What we're going to do, is we're going to basically try to find. All the pixels in the image compare them with the probability of being foreground and the probability of being background. And then we are going to make a decision. How we're going to do that comparison? Using something which is called geodesic. Distance. And this is something that you basically do every day. When you go to work, when you go in your house from your room to the kitchen, you're basically traveling a path. And you're going to try to travel, all the time, the path of minimal effort. So if you want to go from your room to the kitchen, and you don't want to, you want to get there as fast as possible, you're going the shortest path. Now sometimes does that path has an obstacle. So for example a wall. You cannot go through the wall. You have to go around the wall. So if you draw a map of your office or house, there are certain. Regions that are very easy to go through. Certain regions that are much harder to go through. So when you want to go from one point to another you might not be able to go in a straight line. For example if there is a wall, you're going to have to basically go around it. And the basic idea is that every region in the path has a weight. And that weight, we're going to call it W. What we're going to do is, we're going to go to every pixel in the image and we're going to compute the minimal path. To let foregram prevail, the minimal path, to the background screening. Once we have that, we're going to decide if its foreground or background according to which path is shorter. So I have to tell you, what are we going to use as w. Before that, let me just make a remark, that this can be computer very fast, actually, linear time. Visiting every pixel of the image only once you can compute that. Distance. The path. Basically, this formula to the foreground and to the background. So this is not very difficult computation. This is based on extensions of techniques that are called Dios Track techniques. And they are very standard techniques in graph a, a theory. And we're going to discuss them a bit more next week when we talk about partial differential equations. For now, all that I want you to know. Is that this computation, of what's a minimal effort, from traveling from, a pixel, to any one of the scribbles with a weight that I'm going to define in a second. That minimal effort can be computed extremely, extremely fast, mmk, basically can be done in real time. So what's W? Which is computed thanks to the scribbles, the distribution of colors, so that's what we're going to use. We're basically going to use. That probability that we just computed. And what we are going to think is the following. If, from the pixel I'm seeking, going to the scribble, the probability of being foreground doesn't change too much. That means that I'm probably a foreground pixel. If I can keep going for myself to the foreground scribble and the probability of foreground doesn't change meaning the gradient doesn't change. Remember gradient is the difference between consecutive pixels. In the direction of the past. So this is the curve. So I'm going to travel the curve. I'm basically going to go from every place. In the image to a scribble. I'm going to look for a path, and I'm going to say, is my probability of being foreground changing a lot around this path? If it's changing a lot, I'm going to pay a big price. If it's not changing a lot, I'm going to pay a small price. Then I can take another path. I'm going to see which path, is, shorter, with this weight, and I'm going to do this same, for another point, in this scribble, I'm going basically do that for, every point in this scribble, so I'm computing to, every single point, in this scribble. What's the, path, where my probability of fil, be in foreground, is, smaller, than I have to do basically the same, for this scribbles. Of background. So for this pixel I'm going to say what's the probability, whats it cost of getting to the background and then I'm going to compare that. So in pictures when we do this computation which again can be done extremely fast. We have the scribbles. This was the probability we computed before, this image. This is an image where the colors blue means low. Red means large. Same here. Basically blue. Going to red we are basically from small to large. We are basically saying that every pixel here has a very large distance basically to the foreground, but pixels here have very low distance to the foreground and the inverse here. So I want to repeat that once again, we first computed the probability based on the color distributions that the user has marked for us. Then we say okay and I'm just going to use a figure to illustrate that. We basically say from every pixel. In the image. They just compute curves that start at the pixel, finish at the scribble. And let's see what's the cost of traveling that curve. And the cost is given by this function. The less it changes the probability of being foreground, the lower the cost. We do the same to the background for every pixel. And then we say, is it cheaper for you to get to the foreground scribble, than it is for you to get to the background scribble, then I call you a foreground pixel. If it's the other way around, I call you a background pixel. There you see all the computations are done computing this minimal effort curve gone by the image only once. Once you have done that for every pixel, basically you draw the boundary. Of the probability of being background or foreground, and you get your very, very nice segmentation. Relatively simple algorithm. Now, we can do a bit more. This was based on scribbles that were marked somewhere so there was a foreground scribble here. And there were a few background scribbles around. But now we have a first estimate of where our boundary between foreground and background should be. We can explore that in a second stage which is fully automatic. And the basic idea is very simple. Okay, let's just go back. This was what our first estimate was. Let's just move inside, the object a bit, and let's move outside the object a bit full automatic fashion, and we can call this now. A foreground scribble. Not the one that we mark before, we call this one now, which is automatically computed, we call it a foreground scribble and we call this a background scribble. And I can do my algorithm once again but I can do the estimation of the probability of being foreground. Now based on that scribble which is very, very close to the boundary, so it probably have better information, about where that boundary should be. And actually moreover, we can actually do that locally. If I want to refine this boundary, I can only look at this scribble, and this piece of the scribble, compute my probability and refine it then go into a different region, and also only use that to refine this and so on. So I can travel around my first estimate, get new estimates of what should be foreground and what should be background, and get a really, really very accurate segmentation from that. So, we'll run twice the algorithm, first we let the user help us to find this distribution. Second, we let the initial stage to tell us how to compute that distribution, and then we get a really, really nice segmentation as you can see, we can basically get very accurate at the boundary. Very basically, we get the errors and we get really, really good results with that. So, let me give you a few examples of this and we see them basically here. We have in every role examples and we can see this prevail. This is the binary segmentation. And then as we already talked, I can put it in a different background. Here basically different images different images so basically every row has a different example its very important when you do any image process algorithm that works for a lot of data you don't want to design an algorithm that works for only one type of data you want to be as global as possible unless you have a very particular application in mind and then its very important to show that it works for a large variety of images. Now you may wonder, what happens. Do I need the user to be an expert? Do I need the user to basically be. You know. Have complete knowledge of the algorithms and mark these scribbles exactly where they should be to give me the hints. Of course, if that's needed, then this algorithm is not very useful. We want the user to basically work with the algorithm in a very relaxed fashion, so. Here is just an example that this type of algorithm inside normally very robust. So we have the same image, and very different foreground, scribbles the blues and very different background scribbles, the greens. And we see that basically the results are the same. You see the boundary between foreground and background being basically identical in all of the images. So we want really the user to be very, very friendly with the images and the algorithm not to ask too much of the user. Actually, it ask very little. It basically, all that the algorithm asks is the user to say, please mark me foreground and basically go and please mark me, background. I used a different colors in my pen than in the image, but the message is very clear. So, this is an example of interactive image segmentation the user has help us. We're going to see a couple of additional videos. One, we are basically going to be running this inside Adobe Photoshop, one of the examples. The other, we're going to do interactive video segmentation in one of the next videos is one of the basically algorithms inside one of Adobe's products that I'm going to give you details about this. So I'll see you in the next videos. Thank you very much.