0:00

In this video I'll explain the mathematical analysis of the randomized

Â linear time selection algorithm that we studied in the previous video.

Â Specifically, I'm going to prove to you the following guarantee for that

Â algorithm. For every single input array of length n the running time of this

Â randomized selection algorithm on average will be linear. Pretty amazing if you

Â think about it because that's barely more what the time it takes just to read the

Â input. And in particular this linear time algorithm is even faster than sorting.

Â So this shows that selection is a fundamentally easier problem than sorting.

Â You don't need to reduce to sorting. You can solve it directly in O(n) time.

Â I want to reiterate the same points I made about quick sort. The

Â guarantee is the same. It is a general purpose subroutine. We make no assumptions

Â about data. This theorem holds no matter what the input array is. The expectation,

Â the average that's in the theorem statement is only over the coin flips made

Â by the algorithm made inside it's code of our own devising. Before we plunge into the

Â analysis, let me just make sure you remember what the algorithm is. So it's

Â like quick sort. We partition around a pivot except we only recurse once, not

Â twice. So we're given an array with some length n. We're looking for the ith order

Â statistic, the ith smallest element. The base case is obvious. You're not in the

Â base case; you choose a pivot p, uniformly at random from the input array just like

Â we did in quick sort. We partition around the pivot just like we did in pic, in

Â quick sort. That splits the array into a first part of those elements less than the

Â pivot and the second part of those elements which are bigger than the pivot.

Â Now, we have a couple of cases. The case which is very unlikely so we don't really

Â worry about, if we're lucky enough to guess the pivot as the ith order

Â statistic what we're looking for. That's when the new position j. Of the pivot

Â element happens to equal I. What we're looking for. Then, of course, we just

Â return it. That was exactly what we wanted. In the general case, the pivot is

Â going to be in the position J, which is either bigger than what we're looking for

Â I, that's when the pivot is too big or J. It's position will be less than the ith order statistic

Â we're looking for. That's when the pivot is too small. So if the pivot's too big,

Â if J is bigger than i that when we're looking for is on the left hand

Â side amongst the elements less than the pivot. So that's where we recurse. We've

Â thrown out both the pivot and everything to the right of it. That leaves us with an

Â array of J minus I elements and we're still looking for the ith smallest among

Â these J minus1 smallest elements. And then the final case, this is what we went

Â through in the quiz and last video, is if we choose a pivot who's smaller than what

Â we're looking for, that's when J is less than I, then it means we're safe to throw

Â out the pivot and everything less than it. We're safe recursing on the second part of

Â those elements bigger than the pivot. Having thrown out the J's smallest

Â elements, we're recursing on an element of length of N-J and we're looking for the i-j

Â smallest element in those that remain, having already thrown out the J

Â smallest from the input array. So that's randomized selection. Let's discuss why

Â it's linear time on average. The first thought that you might have, and this

Â would be a good thought, would be that we should proceed exactly the same way that

Â we did in Quick Sort. You recall that when we analyzed Quick Sort, we set up these

Â indicator random variables, x, i ,j determining whether or not a given, pair

Â of elements got compared at any point in the algorithm. And then we just realized

Â the sum of the comparisons is just the sum, overall, of these x,i, js. We applied

Â linearity of expectation and it boiled down to just figuring out the probability

Â that a given pair of elements gets compared. You can analyze this randomized

Â selection algorithm in exactly the same way. And it does give you a linear time

Â bound on average. But it's a little messy. It winds up being not quite as clean as in the

Â quick sort analysis. Moreover, because of the special structure of the selection

Â problem, we can proceed in an even more slick way here than the way we did with

Â quick sort. So, again we'll have some constituent random variables. We'll again

Â apply linearity of expectation but the definition of those random variables is

Â going to be a little bit different than it was in quick sort. So, first a preliminary

Â observation. Which is that the workhorse for this randomized selection procedure is

Â exactly the same as it was in quick sort. Namely it's the partition subroutine.

Â Essentially all of the work that gets done outside of the recursive call just

Â partitions the input array around some pivot element as we discussed in detail in a

Â separate video that takes linear time. So usually when we say something's linear

Â time we just use big O notation. I'm gonna go ahead and explicitly use a constant c

Â here for the operations outside the recursive call. That'll make it clear that

Â I'm not hiding anything up my sleeves when we do the rest of the analysis. Now what I

Â wanna do on this slide is introduce some vocabulary, some notation which will allow

Â us to cleanly track the progress of this recursive selection algorithm. And by

Â progress I mean. The length of the array on which is currently operating.

Â Remember we're hoping for a big win over quick sort, cuz here we only do one

Â recursive call, not two. We don't have to recurse on both sides of the pivot just on

Â one of them. So it stands to reason, that we can think about the argument making

Â more and more progress as a single recursive calls operating on arrays of

Â smaller and smaller length. So the notion that will be important for this proof is

Â that of a phase. This quantifies how much progress we've made so far, with higher

Â numbered phases corresponding to more and more progress. We'll say that the

Â r select algorithm at some midpoint of its execution is in the middle of phase J.

Â If the array size that the current recursive call is working on has length between

Â 3/4th raised to the J times N and the smaller number 3/4th J+1 times N. For example

Â think about the case where J equals zero. That says phase zero recursive calls,

Â operate on arrays with size of N and 75 percent of N. So, certainly, the outermost

Â recursive call is going to be in phase zero. Because the input array has size N.

Â And then, depending on the choice of the pivot, you may or may not get out of phase

Â zero in the next recursive call. If you choose a good pivot, and you wind up

Â recursing on something, that has, at most, 75 percent of the original elements, you

Â will no longer be in phase zero. If you recurse on something that has more than 75

Â percent of what you started with, of the. Input array, then you're still gonna be in

Â phase zero even in the second recursive call. So overall the phase number J,

Â quantifies the number of times we've made 75 percent progress, relative to the

Â original input array. And the other piece of notation that's going to be important

Â is what I'm going to call Xj. So for a phase J, Xj simply counts the number of

Â recursive calls in which a randomized selection algorithm is in phase J. So this

Â is gonna be some integer. It could be as small as zero, if you think about it, for

Â some of the phases. Or it could be larger. So why am I doing this? Why am I making

Â these definitions of phases and of these XJs? What's the point? We're gonna

Â remember the point was we wanna be able to cleanly talk about the progress that the

Â randomized selection algorithm makes through its recursion, and what I wanna

Â now show you is that in terms of these XJs, counting the number of iterations in

Â each phase, we can derive a relatively simple upper bound on the number of

Â operations that our algorithm requires. Specifically the running time of our

Â algorithm, can be bounded above by the running time in a given phase, and then

Â summing those quantities over all of the possible phases. So we're gonna start with

Â a big sum, over all the phases J. We want to look at the number of recursive calls

Â that we have to endure in phase J, so that's XJ by definition. And then we look

Â at the work that we do outside of the recursive calls in each recursive call

Â during phase J. Now, in a given recursive call, outside of its recursive call, we do

Â C times M operations where M is the length of the input array and during phase J we

Â have an upper bound on the link of the input array. By definition it's at most

Â three quarters raised to the J times N. So that is, we multiply the running time

Â times this constant C this, we inherit from the partition subroutine and then we

Â can, for the input length, we can put an upper bound of three quarters raised to

Â the J times N. So just to review where all of these terms come from, there's three

Â quarters J times N is an upper bound on the array size. During phase J, this by

Â the definition of the phase. Then, if we multiply that times c, that's the amount

Â of work that we do on each phase J sub-problem. How much work do we do in

Â phase J overall or we just take the work per sub problem that's what's circled in

Â yellow and we multiply it times the number of such sub problems we have. And, of

Â course, we don't wanna forget any of our sub problems so we just make sure we sum

Â all of our phases, j, to insure that at every point we count the work done in each

Â of the sub problems. Okay? So, that's the upshot of this slide. We can upper bound

Â the running time of our randomized algorithm very simply in terms of phases

Â and the XJ's, the number of sub problems that we have to endure during phase j. So,

Â this upper bound on our running time is important enough to give it notation, we'll

Â call this star, this will be the starting point of our final derivation when we

Â complete the proof of this theorem. Now don't forget, we're analyzing a randomized

Â algorithm so therefore the left hand side of this inequality the running time of r select,

Â that's a random variable. So that's a different number depending on the

Â outcome of the random coin flips of the algorithm. Depending on the random pick it

Â has chosen, you will get different random running times. Similarly the right hand

Â side of this inequality. Is also a random variable. That's because the X J's are

Â random variables. The number of sub problems in phase j depends on which

Â pivots get chosen. So. To analyze, what we care about is the expectations of these

Â quantities, their average values. So we're gonna start modestly and as usual, this

Â will extend our modest accomplishments to much more impressive ones using linearity

Â of expectation, but our first modest goal is just to, to understand the

Â average value. Of an XJ, the expected value of XJ. We're gonna do that in two

Â steps. On the next slide, I'm going to argue that to analyze the expectation of

Â XJ, it's sufficient to understand the expectation of a very simple coin flipping

Â experiment. Then, we'll analyze that coin flipping experiment. Then we'll have the

Â dominos all set up in a row. And on the final slide, we'll knock'em down and

Â finish the proof. So let's try to understand the average number of recursive

Â calls we expect to see in a given phase. So, again, just so we don't forget. Xj is

Â defined as the number of recursive calls during phase J. Where a recursive call is

Â in phase J, if and only if the current sub array length lies between three-fourths

Â raised to the J+1 times N. And then, the larger number of three-fourths raised to

Â the J times N. So again, for example, phase zero is just the recursive calls

Â under which the array length is between 75 percent of the original element and 100

Â percent of the original elements. So what I wanna do next is point out that a very

Â simple sufficient condition guarantees that we'll proceed from a given phase onto

Â the next phase. So it's a condition guaranteeing termination of the current

Â phase. And it's an event that we've discussed in previous videos. Mainly that

Â the pivot that we choose gives a reasonably balanced split. 25-75 or

Â better. So recall how partitioning works, we choose a pivot P. It winds up wherever

Â it winds up. And the stuff to the left of it's less than P. The stuff to the right

Â of it is bigger than P. So 25 to 75 split or better, what I mean is that each of

Â these, each, the first part and the second part has, at most, 75 percent of the

Â elements in the input array. Both have twen-, both have at least 25%, and, at

Â most, 65%. And the key point is, that if we wind up choosing a pivot that gives us

Â a split that's at least as good the current phase must end. Why must the

Â current phase end? Well, to get a 25, 75 split or better than no matter which case

Â we wind up in, in the algorithm we're guaranteed to recurse on a sub problem that

Â has at most 75 percent of what we started with. That guarantees that whatever phase

Â we're in now, we're going to be in an even bigger phase when we recursed. Now, I want

Â you to remember something that we talked about before, which is that you've got a

Â decent chance when you pick a random pivot of getting something that gives you a 25,

Â 75 split or better. In fact, the probability is 50 percent. Right? If you

Â have an array that has the integers from one to 100 inclusive, anything from 76 to

Â s, 26 to 75 will do the trick. That'll insure that at least the first 25 elements

Â are excluded from the rightmost call and at least rightmost 25 elements are

Â excluded from the left recursive call. So this is why we can reduce our analysis of

Â the number of recursive calls during a given phase, to a simple experiment

Â involving flipping coins. Specifically, the expected number of recursive calls.

Â Now we are gonna see in a given phase J, is no more than the expected number of

Â coin flips in the following experiment. Okay, so you've got a fair coin, 50

Â percent heads, 50 percent tails. You commit to flipping it until you see the

Â head and the question is, how many coin flips does it take up to and including the

Â first head that you see? So the minimum it's gonna be one coin flip if you hit a

Â head the first time it's one. If you get a tails and then a head, then it's two. If

Â it's tails, tails, head it's three and so on, and you always stop when you hit that

Â first head. So what's the correspondence? Well, think of heads as being you're in

Â Phase J, and if you get a good pivot, it gives you a 25/75 split. Call that heads.

Â And it guarantees that you exit this Phase J. Just like it guarantees that you get to

Â terminate the coin flipping experience, experiment. Now, if you get a pivot which

Â doesn't give you a 25/75 split, you may or may not pass to a higher Phase J, but in

Â the worst case, you don't. You stick to phase J is you get a bad split, and that's

Â like getting a tails in the coin flipping experiment, and you have to try again.

Â This correspondence give us a very elementary way to think about the progress

Â that, that our randomized selection algorithm is making. So, there's one

Â recursive call in every step in our algorithm, and each time we either choose

Â a good pivot or a bad pivot, both could happen, 50-50 probability. A good pivot

Â means we get a 75-25 split or better. A bad pivot means, by definition, we get a

Â split worse than 25-75. So what have we accomplished? We've reduced the task of

Â upper bounding the expected number of recursive calls in a phase J to

Â understanding the expected number of times you have to flip a fair coin before you

Â get one hit. So on the next slide we'll give you the classical and precise answer

Â to this coin flipping experiment. So, let me use capital N to denote the random

Â variable, which we were just talking about, the number of coin flips you need

Â to do before you see the first heads. And, it's not very important, but you should

Â know that these random variables have their own name. This would be a geometric

Â random variable with parameter one-half. So you can use a few different methods to

Â compute the expected value of a geometric random variable such as this, and brute

Â force using the definition of expectation works fine as long as you know how to

Â manipulate infinite sums. But for the sake of variety, let me give you a very sneaky

Â proof of what it's expectation is. So the sneaky approach is to write to the

Â expected value of this random variable in terms of itself and then solve for the

Â unknown, solve for the expectation. So let's think about it. So how many coins

Â flips do you need? Well for sure you're gonna need one. That's the best case

Â scenario. And now two things can happen, either you get heads and that has 50

Â percent probability you stop or you get tails that happens with 50 percent

Â probability and now you start all over again. Again you just put points until you

Â get first heads. On average how many times does that take. Well by the definition of

Â capital N you expect. The expectation of N coin flips, in the case where you get

Â tails, and you have to start all over. So this one represents the first coin flip,

Â the one-half is the probability that you can't stop, that you have to start all

Â over probability of tails, and then because it's a memory less process,

Â because when you start anew on the second coin flip having gotten the tails, it's as

Â if you're back at time one all over again. So now we have a trivial equation, in

Â terms of the unknown expected value of N and the unique solution, the unique value,

Â that the expected value of capital N could have, in light of this equation, is two.

Â So, on average if you flip a fair coin and stop when you get heads, you're going to

Â see two coin flips on average. To make sure you haven't sort of lost the forest

Â for the trees, let me remind you why we were talking about this coin flipping

Â analysis in the first place. So recall in the previous slide we showed that XJ, and

Â remember XJ is the number of recursive calls you'd expect to see in a given phase

Â J, and we argued that the number of recursive calls you're gonna see is bounded

Â above. By the expected number of coin flips until the heads. So this exact

Â calculation of two for the coin flips gives us an upper bound of two for the

Â number of recursive calls on average in any given phase J. So now that we've got

Â all our ducks lined up in a row, let's wrap up the proof on this final slide. So,

Â inherited from part one of the proof, we have an upper bound. On the expected

Â running time. Of the R select algorithm. This is what we were calling star on the

Â first brief slide In star, it looked a little messy, but we had the sum over the

Â phases J. But we had two things that were independent of j: the constant c and the

Â original input length n, so let me just yank the c and the n out front. And then

Â we have this residual sum over the phases J. Of three quarters raised to the J

Â remember that comes from our upper bound on the sub problem size during phase J and

Â then of course we have to keep track of how many phase J sub problems we have

Â solved that by definition is XJ. Now star was written as a rand in accordance terms

Â to the random variables. Now we're gonna go ahead and take the expectations and

Â again I have said this over and over but don't forget where's the expectation come

Â from. This is over the random pivot choices that our code makes. So the

Â expected running time of the algorithm is most the expectation of this start

Â quantity. So like I said earlier, pretty much every time we're gonna do any

Â analysis of [inaudible] process, we're gonna wind up using linearity of

Â expectation at some point. Here is where we do it. Linear expectation says the

Â expectation of a sum is just the sum of the expectations. So we yank the c and the

Â n outside of the expectation. We yank this sum over phases. Outside of the

Â expectation. We yank this three-fourths raised to the J outside of the expectation

Â and then we just have the expected value of XJ, the average number of recursive

Â calls we expect to see in HJ. Now on the previous two slides, we figured out an

Â upper bound on how many recursive calls we expect to see in each phase. So first by

Â the coin flip analysis, by the reduction of the coin flip analysis, this is the

Â most expected number of coin flips N, which on the previous slide, we argued was

Â exactly two. So bringing that two out in front of the sum, that no longer depends

Â on J. So we get a most 2CN. Times the sum over phases J, of three quarters raised to

Â the J. Now this kind of sum we have seen previously in the course. It came up when

Â we were analyzing the master method and we summed up our running time upper bounds

Â over the levels of our recursin tree. And if we're not in case one if we're in

Â case two or three we had geometric sums that were nontrivial. They require a

Â certain formula to calculate, so let me remind you of that formula here, when the

Â three quarters are being powered up to the J. So this has value at most, one over one

Â minus, the number that's getting powered, so in this case it's three quarters, one

Â minus three quarters is a quarter check reciprocal, you got four. And the upshot

Â is that the expected number of operations that this randomized selection algorithm

Â uses to find the [inaudible] ordered statistic in a given input array, is eight

Â times C times N. Where C is the, hidden constant in the linear running time of

Â partition. And so that completes the proof. The input array was arbitrary. We

Â showed the expected running time over the random choices of the algorithm is linear

Â in N. That is, only a constant factor larger than what is required to read the

Â input. Pretty amazing.

Â