0:00

Alright. So the plan for this video is to prove the correctness of the divide and

Â conquer closest to pair algorithm that we discussed in the previous video. So just

Â to refresh your memory, how does the outer algorithm work? Well, we're given endpoints

Â in the plane. We begin by sorting them, first by x-coordinate and then by

Â y-coordinate. That takes n log in time. Then we enter the main recursive divide

Â and conquer part of the algorithm. So what do we do? We divide the point set into the

Â left half and the right half, Q and R, then we conquer. We recursively compute

Â the closest pair in the left half of the point set Q. We recursively compute the

Â closest pair in the right half of the point set R. There is a lucky case where

Â the closest pair on the entire point set lies either all on the left or all on the

Â right. In that case, the closest pair is handed to us on a silver platter,

Â by one of the two recursive calls. But there remains the unlucky case where the closest

Â pair is actually split with one point on the left and one point on the right. So to

Â get our N log N running time bound, analogous to Merge Short in our inversion

Â counting, we need to have a linear time implementation of a subroutine which

Â computes the best, the closest pair of points, which is split, one on the left

Â and one on the right. Well, actually, we don't need to do quite that. We need to

Â do something only a little bit weaker. We need a linear time algorithm, which whenever

Â the closest pair in the whole point set is in fact split, then computes that split

Â pair in linear time. So let me now remind you of how that subroutine works.

Â So it has two basic steps. So first, there's a filtering step. So it looks at,

Â first of all, a vertical strip, roughly down the middle of the point set. And it

Â looks at, only at points which fall into that vertical strip. That was a subset of

Â the points that we called S sub Y, 'cause we keep track of them sorted by Y

Â coordinate. And then we do essentially a linear scan through SY. So we go through

Â the points one at a time, and then, for each point, we look at only the almost

Â adjacent points. So for each index I, we look only at J's that are between one and

Â seven positions further to the right, than I. So among all such points, we

Â compare them, we look at their distances. We remember the best such pair of points.

Â And then that's what we return from the count split pair subroutine. So we've

Â already argued, in the previous video, that the overall running time of the

Â algorithm is N log N. And what remains to prove correctness. And we also argued, in

Â the previous video, that correctness boils down to the following correctness claim.

Â In the sense that, if we can prove this claim, then the entire algorithm is

Â correct. So this is what remains. Our residual work is to provide a proof of

Â the correctness claim. What does it say? It says consider any split pair that is

Â one point p from the left side Q, capital Q, and another point little q drawn from

Â the right side of the point set capital R. And fur, further suppose that it's an

Â interesting split pair meaning that the distance between them's at most delta.

Â Here delta is recall the parameter pass to the count split pair subroutine, which is

Â the smallest distance between a pair of points all on the left or all on the

Â right. And this is the only case we're interested in. There's two claims. First

Â of all, for p and q, both members of the split pair survive the filtering step.

Â They make it into the sorted list S sub Y, and second of all, they will be considered by

Â that double for loop, in the sense that the positions of p and q in this array, S

Â sub Y, differ by at most seven. So that's the story so far. Let's move on to the

Â proof. So let's start with part A which is the easy part relatively of the claim. So

Â remember what we start with, our assumptions. We have a point p, let's

Â write it out in terms of the X coordinates, X1 and Y1, which is from the

Â left half of the point set. And we have a point q, which we'll call X2Y2, which

Â comes from the right half of the point set. And furthermore, we're assuming that

Â these points are close to each other. And we're gonna use that hypothesis over and

Â over again. So the Euclidean distance between p and q is no more than this

Â parameter delta. So, first, something very simple, which is that if you have two

Â points that are close in Euclidean distance, then both of their coordinates

Â have to be close to each other, right? If you have two points, and they differ by a

Â lot in one coordinate, then the Euclidean distance is gonna be pretty big as well.

Â So, specifically. By our hypothesis, that p and q have Euclidean distance less than

Â delta, it must be that the difference between their coordinates in absolute

Â value is no more than delta, and as well, the difference between their y-coordinates

Â is at most delta. Okay, and this is easy to see if you'd just return to the

Â definition of Euclidean distance that we reviewed at the beginning of the

Â discussion of closest points. Okay? So if your distance is at most delta, then in

Â each coordinate, you differ by at most delta as well. Now, what does A say?

Â [sound]. So proof of A. So what does part A of the claim assert? It asserts that p

Â and q are both members of SY, are both members of that vertical strip. So another

Â way of saying that is that the X coordinates of p and q, that is, the

Â numbers X1 and X2 both are within delta of Xbar. Remember, Xbar was in some sense

Â the median X coordinate. So the X coordinate of the N over two'th leftmost

Â point. So we're gonna do a proof by picture, so consider, forget about the Y

Â coordinates, that's of irrelevant right now, and just focus on the X coordinates

Â of all of these points. So on the one hand we have X bar. This is the X coordinate of

Â the N over two'th point to the left. And then there are the X coordinates which define

Â the left and the right borders of that vertical strip. Namely Xbar-delta and

Â Xbar+delta. And then somewhere in here are X1 and Y1, the X coordinates of the points

Â we care about, p and q. So a simple observation, so because p comes from the

Â left half of the point set, and Xbar is the rightmost X coordinate of the left

Â half of the point set, the X coordinate is at most Xbar. Right? So all points of Q

Â have X coordinate, at most, Xbar, in particular, p does. Similarly, since Xbar

Â is the rightmost edge of the left half of the point set, everything in the right half

Â of the point set has X coordinate, at least Xbar. So in particular, little q

Â does as well. So what does this mean. So this means x1, wherever it is, has to be

Â at the left of x bar. X2 wherever it is has to be to the right of x bar. What

Â we're trying to prove is that they're wedged in between x bar minus delta and x bar

Â plus delta. And the reason why that's true is because their x coordinates

Â also differ by at most delta. Okay, so what you should imagine is. You can imagine x1

Â and x2 are sort of people tied by a rope at the waist. And this rope has

Â length delta. So wherever x1 and x2 move, they're at most delta apart.

Â Furthermore x1, we just observed, can't move any farther to the right than Xbar.

Â So even if X1 moves as far to the right as it can, all the way to Xbar, X2, since

Â it's at most delta away, tied by the waist, can't extend beyond X bar+ delta. By

Â the same reasoning, X2 can't move any further to the left than Xbar, X1 being

Â tied to the waist to X2, can never drift further to the left than Xbar minus delta.

Â So that's the proof that X1 and X2 both lie within this region, and that defines

Â the vertical strip. So that's part A. If you have any split pair whose distance between

Â them is less than delta, they both have to wind up, in this vertical strip. And

Â therefore wind up in the filtered set, the proof set, S sub Y. So that's part A of

Â the claim. Let's now move to Part B. Recall what part B asserts. It says that

Â the points p and q, this split pair that are distance only delta apart. Not only do

Â they wind up in this sort of filtered set SY, but in fact, they are almost adjacent

Â in SY, in the sense that the indices in the array differ by, at most, seven

Â positions. And this is a part of the claim that is a little bit shocking. Really what

Â this says is that we're getting away with more or less a variant of our one

Â dimensional algorithm. Remember when we wanted to find the closest pair of points

Â on the line, all we had to do was sort them by their single coordinate and then

Â look at consecutive pairs and return the best of those consecutive pairs. Here what

Â we're saying is really, once we do a suitable filtering focus on points in this

Â vertical strip, then we just go through the points according to their Y

Â coordinate. And okay, we don't just look at adjacent pairs. We look at pairs within

Â seven positions, but still we basically do a linear sweep through the points in SY.

Â According to their Y coordinate and that's sufficient to identify the closest split

Â pair. So why on earth will this be true. So our workhorse in this argument will be

Â a picture which I am going to draw on next. So I'm going to draw eight boxes,

Â which have a height and width delta over two. So here, delta is the same parameter

Â that gets passed to the closest split pair subroutine. And it's also the same

Â delta which we're assuming p and q are closer to each other than, right? So

Â that's, remember, that's one of our hypotheses in this claim. The distance

Â between p and q is strictly less than delta. So we're gonna draw eight delta

Â over two boxes. And they're gonna be centered at x bar. So, this same center of

Â the vertical strip that defines S Y. And the bottom is going to be the smaller of the

Â Y-coordinates of the points p and q. So it might be p, it might be q. It

Â doesn't really matter. But the bottom is going to be the smaller of the two. So

Â the picture then looks as follows. So the center of these collections of eight

Â boxes, X bar, the bottom is the minimum of Y1, Y2. We're gonna have two rows and four

Â columns. And needless to say, we're drawing this picture just for the sake of

Â this correctness proof, right? This picture is just a thought experiment in

Â our head. We're just trying to understand why the algorithm works. The algorithm, of

Â course, does not draw these boxes. The subroutine, the, closest split pair

Â subroutine is just that pseudo code we saw in the previous video. This is just to

Â reason about the behavior of that subroutine. Now looking ahead, I'll make

Â this precise in two lemmas that are about to come up, what's going to be true

Â is the following. So, either p or q is on this bottom line, right? So we define the

Â bottom to be the lower Y coordinate of the two. So maybe, for example, q is the one

Â that has the smaller Y coordinate, in which case, is gonna be, somewhere, say,

Â down here. P, you remember, is from the left half of the point sets. So p is maybe

Â gonna be here or something. And we're gonna argue that both p and q have to be

Â in these boxes. Moreover, we're gonna argue that these boxes are sparsely

Â populated. Every one contains either zero or one point of the array S sub Y. So,

Â what we're gonna see is that there's at most eight points in this picture, two of

Â which are p and q, and therefore, if you look at these points sorted by Y

Â coordinate, it has to be that they're within seven of each other, the difference

Â of indices is no more than seven. So, we're gonna make those two statements

Â precise one at a time by the following two lemmas. Let's start with lemma one. Lemma

Â one is the easy one. And it states that all of the points of S sub Y, which show up

Â in between the Y coordinates of the points we care about p and q have to appear in

Â this picture, they have to lie in one of these eight boxes. So we're going to argue

Â this in two steps. First, we're going to argue that all such points have to have Y

Â coordinates within the relevant range of this picture between the minimum of Y1 and

Â Y2 and delta more than that, and secondly that they have to have X coordinates in

Â the range of this picture, namely between X bar minus delta and X bar plus delta. So

Â let's start with Y coordinates. So again, remember this key hypothesis we have,

Â okay. We're dealing with a split pair p-q that are close to each other. The distance

Â between X and Y is strictly less than delta. So the very first thing we did at

Â the beginning of this proof is we said well, if their Euclidean distance is less

Â than delta then they have to differ by at most delta in both of their coordinates,

Â in particular in their Y coordinate. Now remember whichever is lower of p and

Â q, whichever one has a smaller y coordinate is precisely at the bottom of

Â this diagram. For example, if q is the one with the smaller y coordinate, it might be

Â on the black line right here. So that means in particular x has y coordinate no

Â more than the top part of this diagram. No more than delta bigger than q. And of

Â course all points with Y coordinates in between them are equally well wedged into

Â this picture. So that's why all points of SY with a Y coordinate between those of p

Â and q have to be in the range of this picture, between the minimum of the two Y

Â coordinates and delta more than that. Now what about horizontally? What about the X

Â coordinates? Well this just follows from the definition of S sub Y. So remember,

Â S sub Y are the points that fall into this vertical strip. How did we define the

Â vertical strip? Well it had center Xbar, and then we fattened it by delta on both

Â sides. So just by definition, if you're an SY, you've gotta have X coordinates in the

Â range of this picture. X delta plus minus, sorry, xbar plus minus delta. So that

Â completes the proof of the lemma. So this is not. This is just a lemma. So I'll use a

Â lower case qed. Remember this is just a step toward proving the overall

Â correctness claim. But this is a good step. And again, the way you think about

Â this is it says we draw this boxes. We know that either p or q is at the bottom.

Â The other one is going to be on the other side of the black line x bar but will be

Â in some other box so perhaps maybe p is here and the lemma is saying that all the

Â relevant points of SY have to be somewhere in this picture. Now remember in our

Â double for loop we only search seven positions away, so the concern is that

Â this is a sorta super highly populated collection of eight boxes. That's the

Â concern, but that's not going to be the case and that's exactly what lemma two is

Â going to say. Not only do the points between p and q in Y coordinates show up

Â in this diagram, but there have to be very few. In particular, every box has to be

Â sparse, with population either zero or one. So, let's move on to lemma two. So formally

Â the claim is [sound], we have at most one point of the point set in each of these

Â eight boxes. And this is finally where we use, in a real way, the definition of

Â delta. This is where we finally get the payoff from our realization long ago, that

Â when defining the closest split pair subroutine, we only really need to be

Â correct in the unlucky case. In the case we're not handed the right answer by one

Â of our recursive calls. We're finally gonna use that fact in a fundamental way.

Â So we're gonna proceed by contradiction. So we're going to think about what happens

Â if there are two points in a single box and from that we'll be able to derive a

Â contradiction. So, call the points that wind up in the same box A and B. So, to

Â the contrary, suppose A and B lie in the same box. So, maybe this is A here, and

Â this is B here, at antipodal corners of this particular box. So from this

Â supposition, we have two consequences. First of all. I claim that A and B lie on

Â the same side of the point set. They're either both in the left side, Q or they're

Â both in the right side, R. So why is this true? Well it's because every box lies either

Â entirely on the left half of the point set or on the right half of the point

Â set. Recall how we define x bar. X bar is the x coordinate of the right most point

Â amongst the left half of the point set capital Q. So therefore points with x

Â coordinate at most x bar have to lie inside the left half Q. Points with x

Â coordinates at least x bar have to lie inside the right half of the point

Â set capital R. So that would be like in this example. A and b both lie in a box

Â which is to the right of x bar. So they both have to come in the right half

Â of the point set capital R. This is one place we are using that there are no ties

Â in X coordinate, so if there's a point with X, X coordinate or X bar, we can

Â count it as part of the left half. So every box, by virtue of being either to

Â the left of xbar or to the right of xbar, can only contain points from a common half

Â of the point set. So that's the first consequence of assuming that you have two

Â points in the same box. The second consequence is, because the boxes are

Â small, the points gotta be close. So, if A and B co-habitate a box, how far could

Â they be from each other? Well, the farthest they could be is like I've drawn

Â in the picture, with the points A and B. Where they're at opposite corners of a

Â common box. And then you bust out Pythagorean's Theorem, and what do you

Â get? You get that the distance between them is delta over two, the side of the

Â box times root two. And what's relevant for us is this is strictly less than

Â delta. Okay? But, now, here is where we use, finally, the definition of delta.

Â Consequences one and two in tandem, contradict how we define delta. Remember

Â what delta is. It's as close as two pair of, a pair of points can get if they both

Â lie on the left side of the point set, or if they both lie on the right side of the

Â point set. That is how we defined it. As small as a pair of points on a common half

Â can get to each other. But what have we just done? We've exhibited a pair A and B

Â that lie on the same half of the point set, and are strictly closer than delta.

Â So that contradicts the definition of delta. So that completes the proof of lemma

Â two. Let me just make sure we're all clear on why having proved lemma one and lemma two

Â we're done with the proof part B of the claim and therefore the entire claim

Â because we already proved part one, now a long time ago.

Â So let's interpret the 2 lemmas in the context of our picture that we had all

Â throughout. In terms of the eight boxes of side length delta over two by

Â delta over two. So again, whichever is the lower of p and q, and again let's just for

Â the sake of concreteness say it's q, is at the bottom of the picture. The other point

Â is on the other half of the line Xbar, and is in one of the other boxes. So, for

Â example, maybe p is right here. So lemma one says that every relevant point, every

Â point that survives the filtering and makes it into SY, by virtue of being in

Â the vertical strip, has to be in one of those boxes, okay? If it has Y coordinate

Â in between p and q. Lemma two says that you can only have one point in each of these boxes

Â from the point set, so that's gonna be at most eight total. [sound] So combining

Â them. Lemmas one and two imply, that there are almost eight points in this picture

Â and that includes p and q because they also occupy two of eight boxes. So in the

Â worst case, if this is as densely populated as could possibly be, given

Â lemmas one and two, every other box might have a point and perhaps every one of

Â those points has a Y coordinate between p and q. But this is as bad as it gets.

Â Any point of the strip with Y coordinate between p and q occupies a box.

Â So, at most, there are these six wedged in between them. What does this mean? This

Â means if from q you look seven positions ahead in the array, you are

Â guaranteed to find this point p. So a split pair with distance less than delta

Â is guaranteed to be identified by our double for loop. Looking seven positions

Â ahead in the sorted array SY is sufficient to identify, to look at every conceivably

Â interesting split pair. So that completes the assertion B of the correctness

Â claim and we're done. That establishes that this supremely clever

Â divide and conquer algorithm is indeed a correct O(nlog(n)) algorithm that computes the

Â closest pair of a set of n points in the plane.

Â