0:00

One of the problems of Object Detection as you've learned about this so far,

Â is that your algorithm may find multiple detections of the same objects.

Â Rather than detecting an object just once,

Â it might detect it multiple times.

Â Non-max suppression is a way for you to make

Â sure that your algorithm detects each object only once.

Â Let's go through an example.

Â Let's say you want to detect pedestrians,

Â cars, and motorcycles in this image.

Â You might place a grid over this,

Â and this is a 19 by 19 grid.

Â Now, while technically this car has just one midpoint,

Â so it should be assigned just one grid cell.

Â And the car on the left also has just one midpoint,

Â so technically only one of those grid cells should predict that there is a car.

Â In practice, you're running

Â an object classification and localization algorithm for every one of these split cells.

Â So it's quite possible that

Â this split cell might think that the center of a car is in it,

Â and so might this,

Â and so might this, and for the car on the left as well.

Â Maybe not only this box,

Â if this is a test image you've seen before,

Â not only that box might decide things that's on the car,

Â maybe this box, and this box and maybe others as

Â well will also think that they've found the car.

Â Let's step through an example of how non-max suppression will work.

Â So, because you're running

Â the image classification and localization algorithm on every grid cell,

Â on 361 grid cells,

Â it's possible that many of them will raise their hand and say,

Â "My Pc, my chance of thinking I have an object in it is large."

Â Rather than just having two of the grid cells out of the

Â 19 squared or 361 think they have detected an object.

Â So, when you run your algorithm,

Â you might end up with multiple detections of each object.

Â So, what non-max suppression does,

Â is it cleans up these detections.

Â So they end up with just one detection per car,

Â rather than multiple detections per car.

Â So concretely, what it does,

Â is it first looks at the probabilities associated with each of these detections.

Â Canada Pcs, although there are

Â some details you'll learn about in this week's problem exercises,

Â is actually Pc times C1,

Â or C2, or C3.

Â But for now, let's just say is Pc with the probability of a detection.

Â And it first takes the largest one,

Â which in this case is 0.9 and says,

Â "That's my most confident detection,

Â so let's highlight that and just say I found the car there."

Â Having done that the non-max suppression part then looks at all of

Â the remaining rectangles and all the ones with a high overlap,

Â with a high IOU,

Â with this one that you've just output will get suppressed.

Â So those two rectangles with the 0.6 and the 0.7.

Â Both of those overlap a lot with the light blue rectangle.

Â So those, you are going to suppress

Â and darken them to show that they are being suppressed.

Â Next, you then go through the remaining rectangles

Â and find the one with the highest probability,

Â the highest Pc, which in this case is this one with 0.8.

Â So let's commit to that and just say,

Â "Oh, I've detected a car there."

Â And then, the non-max suppression part is to

Â then get rid of any other ones with a high IOU.

Â So now, every rectangle has been either highlighted or darkened.

Â And if you just get rid of the darkened rectangles,

Â you are left with just the highlighted ones,

Â and these are your two final predictions.

Â So, this is non-max suppression.

Â And non-max means that you're going to output

Â your maximal probabilities classifications

Â but suppress the close-by ones that are non-maximal.

Â Hence the name, non-max suppression.

Â Let's go through the details of the algorithm.

Â First, on this 19 by 19 grid,

Â you're going to get a 19 by 19 by eight output volume.

Â Although, for this example,

Â I'm going to simplify it to say that you only doing car detection.

Â So, let me get rid of the C1, C2,

Â C3, and pretend for this line,

Â that each output for each of the 19 by 19,

Â so for each of the 361,

Â which is 19 squared,

Â for each of the 361 positions,

Â you get an output prediction of the following.

Â Which is the chance there's an object,

Â and then the bounding box.

Â And if you have only one object,

Â there's no C1, C2, C3 prediction.

Â The details of what happens,

Â you have multiple objects,

Â I'll leave to the programming exercise,

Â which you'll work on towards the end of this week.

Â Now, to intimate non-max suppression,

Â the first thing you can do is discard all the boxes,

Â discard all the predictions of the bounding boxes with

Â Pc less than or equal to some threshold, let's say 0.6.

Â So we're going to say that unless you think there's at least a

Â 0.6 chance it is an object there, let's just get rid of it.

Â This has caused all of the low probability output boxes.

Â The way to think about this is for each of the 361 positions,

Â you output a bounding box together

Â with a probability of that bounding box being a good one.

Â So we're just going to discard

Â all the bounding boxes that were assigned a low probability.

Â Next, while there are

Â any remaining bounding boxes that you've not yet discarded or processed,

Â you're going to repeatedly pick the box with the highest probability,

Â with the highest Pc,

Â and then output that as a prediction.

Â So this is a process on a previous slide of taking one of the bounding boxes,

Â and making it lighter in color.

Â So you commit to outputting that as a prediction for that there is a car there.

Â Next, you then discard any remaining box.

Â Any box that you have not output as a prediction,

Â and that was not previously discarded.

Â So discard any remaining box with a high overlap,

Â with a high IOU,

Â with the box that you just output in the previous step.

Â This second step in the while loop was when on the previous slide you would

Â darken any remaining bounding box that had

Â a high overlap with the bounding box that we just made lighter,

Â that we just highlighted.

Â And so, you keep doing this while there's

Â still any remaining boxes that you've not yet processed,

Â until you've taken each of the boxes and either output it as a prediction,

Â or discarded it as having too high an overlap,

Â or too high an IOU,

Â with one of the boxes that you have just output as

Â your predicted position for one of the detected objects.

Â