Hello. Motion is a powerful tool used by humans and many other animals to extract objects of general interest from a background of irrelevant detail. Although it's done do effortlessly by humans, it's a challenging task for the computer. It is a topic of great activity because it's so fundamental in many applications such as object detection and tracking like vehicles and pedestrians, robotics, surveillance, and video manipulation and editing. We describe some of the basic ideas and show some results. In this segment, we also describe at the high level some of the most recent and powerful approaches for segmentation specifically the mean shift and graph cut algorithms. So let us look at the fine details of this material. So far, we have primarily focused on segmentation based on intensity values or segmentation of still images. When we deal with videos then motion is an important feature that can also be used to perform segmentation. The idea of segmentation is that we want to group objects, pixels, with similar properties so the dominant property here is motion. So an object that moves in a certain direction should standout and be segmented base on motion. Motion is definitely a powerful cue that is use by humans and many other animals to extract objects or regions or interest from a background or irrelevant detail. So given an image sequence we have to define the number of motion models and the type of motion models, if it's an affine model and so on. And then using this motion information we want to perform segmentation. This is a rich field in image processing and computer vision there are techniques that are published every day, I would say on the topic of motion based segmentation and motion based tracking that is tightly coupled with what we are discussing here. So here is an example of one frame of a scene is actually on campus here outside my office. And based on one approach, this is the resulted segmentation. A lot of people walking and some are walking away from the camera. Some towards the camera and they're identified by the various colors. So green is towards the camera and red is away from the camera. There are really a large number of applications that utilize motion based segmentation, some of which are just listed here. So object detection and tracking is the one I wanted to mention. Robotics has a similar problem. If the interaction is based on a camera, we want the robot to be able to identify objects and track them. Surveillance is a large application area. Image and video compression, as we saw in MPEG-4, we are able to perform motion based segmentation and then in MPEG-4 we can encode visual objects separately. Scene reconstruction and video manipulation or editing. Video matting where we wanna pull the objects out of the scene, annotation, motion magnification and so on. We see two results here of object based, motion based segmentation. So this is the whole monitor sequence. Three different methods are compared. The one at the bottom right is the one that is less susceptible to noise cuz it gives clearer results. So you see again, the object that are moving these people that are walking around, that they're pulled out from the video. We see a similar result here. This is a sequence from the top of the engineering building where my office is and you see cars are moving as well as individual people. There is another piece of work we did here in the lab where we tracked the motion of the hands of the person performing CPR on this mannequin, so this can be used for training purpose for example. Then based on the motion, we tried to extract the frequency of compression as well as the depth of compression. There's an actual system that real time can demonstrate that. A basic approach to perform motion-based segmentation may be the one that first comes to mind to everybody is just to look at the difference between two consecutive frames, so the reference image and a second image. So if we have these two images, we take the absolute value of the difference. If it's above a threshold then we put in one there otherwise we put a zero. So it's a threshold based detection of motion. And that way we will obtain a binary image again. The moving part is one and the non moving part is zero. The underlying assumptions are that the images are registered spatially, otherwise we'll be detecting the shifts and that the illumination is constant, otherwise you change the illumination and you see a lot of motion. And these are the effects actually that I demonstrated when we talked about motion estimation earlier in the course. An improvement to the previous simple idea I just explained is this accumulative difference image, ADI. And you can have the absolute base value or the absolute image, cumulative image or positive or negative. So when absolute, we have a reference image here, and then a number of frames. But we compare again, so the absolute value is greater than the threshold then the accumulated here value is increased by one. Otherwise, it stays the same, so it's exactly the same operation but now instead of the absolute value, we take just the value. The positive value if it's above a threshold and otherwise, we take the value if it's below a negative threshold. The isolated entries in Dij result from noise and are not significant. The thresholding is the connectivites removed but may remove this entries as well. And the use of the ADI ignores changes that occur only sporadically. Here's a simple example we performed. It's just a white square that is moving in a certain direction of a number of frames. So we show the positive ADI here, so it can show the size of the moving object and the location of the moving object in reference to the image frame. So this is the idea behind the positive, it can nicely isolate the size while the absolute and negative can show us the speed and direction of the motion. So actually, it's a good idea of all these ADI metrics, absolute positive and negative are used together because each of them conveys a different piece of information. One of the most current and useful and effective techniques in doing segmentation is the mean shift technique. So the first step, the data is transferred into a feature space. It could be the color, the texture, the gradient, the combination of the above. And then the idea is to find the maximum density in this feature space. So here for example, this particular example, that's where the maximum density takes place and we would like to locate that. So we start with an initial estimate of where the maximum density is. This is depicted by the circle, of course this is a multidimensional space or this is a hypersphere. And then we estimate the center of mass of the points that belong to this region. So let's say the center of mass is there, then the vector that needs to move the center of the initial hypersphere to the new center of mass is the mean shift vector. So now, we move the region with centered around the center of mass. And then we perform the same operation. We find the new center of mass which happens to be there. This is the new mean shift vector. We shift the analysis region to the location. The new center of mass and keep going this way. Until at the end we land at this location. So then these will provide one of the segments in performing this segmentation. So this is intuitive idea behind the mean shift algorithm. So here is the result of the application of the mean shift algorithm. We see it applied to this image. These two fine gentlemen and this is the resulted segmented image. We see that it's doing a very satisfactory job when it comes to this region here. But also in kind of segmenting the features, the clothes of the people then the ocean is behind it, the sky is a separate segment and then the sea. And this is kind of a landmass here that is nicely separated and this is yet another example of the application of mean shift. We see quite a hue variation there when the color, the sky, the letters here, and so on. So for this value of segmentation the feature vector contained the three algebraic components of the color image and the two components were a special occasion. So it was a fifth dimensional feature space to which mean shift was applied. Another one of the current approaches, the modern approaches towards segmentation is based on graph cut. So the idea here is that an image can be represented as a graph. So the pixels in an image are vertices of the graph. Then if I consider two pixels, then there's a corresponding edge that connects the two pixels. And the similarity gives rise to a weight or a course or a cost that weights these two pixels. So here the pixels and edging between and similarity based on some method will give me the cost. So after a graph is built based on an image, graph cut is what the name implies. I want to create a cut in this graph so that as shown in the example here one cut will separate the graph into two pieces, the foreground and background for example pieces. So the question is which edges should we cut, and intuition tells us that we should cut the edges that have low cost when the cost expresses similarity. So the low similarity edges should be removed. And therefore, the dissimilar pixels should be in different segments while the similar ones should be in the same segment. For the minimum cut or the st minimum cut problem is taking its name actually s stands for source and t stands for termination. But it can be called the s-Mincut problem as well. So here we see a toy example. Two vertices, a source in the sink and the associate course connecting source sink and the two vertices this is. So the idea of the st-cut is to divide the nodes between source and sink. So one possibility would be this one, like v1 belongs to source, v2 belongs to sink, and then we see the associated cost if we cut here, that's the red vertices. If we cut here, the associated costs should be equal to 16. So we add the cost of these vertex, so it's a 5 here, is a 2 for this one, and say 9 for this one. If we consider the alternative configuration where v2 belongs to the source and v1 to the sink, then we see that the cost if we cut along these edges is this 2 plus 1 plus 4 is equals to 7. So this smaller course, so for this particular toy example that's the cut that should be chosen. But in general, this is not exactly the way when to find the minimum cut. But instead we solved the so-called maximum flow problem. This is the dual problem because one can prove, one has proven, people have proven that the Min-cut and the Max-flow problems are equivalent. So solving the one, gives you the solution for the other. So the idea of the maximum flow is to push the maximum flow through the network and see what is the number for this maximum flow. So for this toy example, we can augmenting path based algorithm. So if I look at this configuration again, then we see that we can push a flow of two and a flow of five of these links in read. So clearly, I can push a flow of two from source to the sink. And if I do so, the flow increases by two and then the remaining capacity of this edge is 2 minus 2, 0 while the remaining capacity of this edge after the flow of two is pushed will be 5 minus 2 equals 3, and that's what these values are here. So then I look for another path. I will allow the maximum flow, and this is this path. So I can push a flow of nine here, push a flow of four through this edge. And clearly, I can only push a flow of four because this here is providing the constraint. So if I do so, the remaining flow is five here, zero here, and they have increased that overflow by four. Is there get an additional path? Well yes, and this is this path. I can push as much five from here, one and three. The minimum is one, so I can push a flow of one through there. And I increase the total flow by one. And here the main capacity is zero, so this is the resulting situation and clearly no more flow can be pushed from the network. So based on this, then this is the resulting cut. So this is the way this problem's solved through the maximum flow algorithm that I just illustrated here in the toy example. So applying the minimum cut algorithm to this cameraman image, we obtain this segmentation here. It's clearly not extremely satisfied for this particular implementation. However these algorithms, the graph cut algorithms can work kind of other advantages that can be explored and exploited. But they simply wanted to demonstrate that this higher level the Min cut algorithm which I assume is rather intuitive at least based on its principle. Welcome to the end of week 11, one more week to go. It's hard to believe that we're reaching the end of the class. We looked this week at the interesting and active topic of image and video segmentation. We covered some of the basic approaches but there are really so many more techniques in the literature and techniques which are used in practice on a daily basis. I believe the material covered during this week provided you with enough information and tools to be able to understand the other techniques in the literature. And also decide which might be the most appropriate one for an application we are invested in. Some of the techniques such as detection are easier to understand than others such as graph cuts. Sometimes a complicated approach results in improved performance but this may not always be the case. Very often simple techniques and ideas win. It is however a problem with no ground truth and therefore there is always room for another method. Next week, we'll talk about some of the recent developments in image processing which are based on the concept of sparsity. So see you next week, the last week of classes.