0:00

So this is the first video of three in which we'll mathematically analyze

Â the running time of the randomized implementation of quick sort.

Â So in particular we're going to prove that the average running time of quick sort

Â is big O of n log n.

Â Now this is the first randomized algorithm that we've seen in the course and

Â therefore in its analysis will be the first time that we're going to need any

Â kind of probability theory.

Â So let me just explain upfront what I'm going to expect you to know.

Â In the following analysis.

Â Basically, I need you to know the first few ingredients

Â of discrete probability theory.

Â So I need you to know about sample spaces,

Â that is how to model all of the different things that could happen,

Â all of the ways that random choices could resolve themselves.

Â I need you to know about random variables, functions on sample spaces,

Â which take on real values.

Â I need you to know about expectations that is average values of random variables and

Â very simple but very key propriety we're going to need in the analysis

Â of quick sort is linearity of expectation.

Â So if you haven't seen this before or if you're too rusty

Â definitely you should review this stuff before you watch this video.

Â Some places you can go to get that necessary review you can look at

Â the probability review part one video.

Â That's up on the course's website.

Â If you'd prefer to read something, like I said at the beginning of the course,

Â I recommend the free online lecture notes by Eric Lehman and

Â Tom Leighton, Mathematics for Computer Science.

Â That covers everything we'll need to know, plus much, much more.

Â There's also a Wikibook on Discrete Probability, which is a perfectly fine,

Â obviously, free source in which you can learn the necessary material.

Â Okay? So after you've got that

Â sort of fresh in your mind, then you're ready to watch the rest of this video.

Â And in particular,

Â we're ready to prove the following theorems stated in the previous video.

Â So the quick sort algorithm with a randomized implementation, that is we're

Â in every single recursive subcall, you pick a pivot uniformly at random.

Â We stated the following assertion.

Â But for every single input, so for a worst case input array of length n,

Â the average running time of QuickSort with random pivots is O(n log n).

Â And again, to be clear where the randomness is,

Â the randomness is not in the data.

Â We make no assumptions about the data.

Â As per our guiding principles.

Â No matter what the input array is, averaging only

Â over the randomness in our own code, the randomness internal to our algorithm.

Â We get a running time of n log n.

Â We saw in the past that the best case behavior of QuickSort is n log n.

Â Its worst case behavior is n squared.

Â So this theorem is asserting that no matter what the input array is,

Â the typical behavior of QuickSort is far closer to the best case behavior

Â than it is to the worst case behavior.

Â Okay. So that's what we're going to prove in

Â the next few videos.

Â So let's go ahead and get started.

Â 3:37

For a given pivot sequence remember that random variables are real value functions.

Â Defined on the sample space.

Â So for a given point in the sample space or pivot sequence sigma, we're going

Â to define C of sigma as the number of comparisons that quick sort makes.

Â Where by comparison,

Â I don't mean something like with an array index in a for-loop.

Â That's not what I mean by comparison.

Â I mean a comparison between two different entries of the input array,

Â by comparing the third entry in the array against the seventh entry in the array,

Â to see whether the third entry or the seventh entry is smaller.

Â 5:40

And if you want a proof of this it's not that interesting so

Â I'm not going to talk about it here.

Â But in the notes posted on the website there is a sketch of why this is true.

Â How you can formally argue that there isn't much work

Â beyond just the comparisons.

Â But I hope most of you find that to be pretty intuitive.

Â So given this, given that the running time that QuickSort

Â boils down just to the number of comparisons.

Â We want to prove the running time is n log n.

Â All we gotta do, quote unquote, all we have to do this proves that the average

Â number of comparisons the QuickSort mix is all nlogn.

Â And that's what we're going to do.

Â That's what the rest of these lecture is all about.

Â So that's what we got to prove.

Â We got to prove the expectation of this random variable C which counts up

Â the number of comparisons QuickSort mix is for arbitrary input array of link n bound

Â by big O of nlogn So

Â the high order bit of this lecture is a decomposition principle.

Â We've identified this random variable, C, the number of comparisons and

Â it's exactly what we care about.

Â It governs the average running time of QuickSort.

Â The problem is, it's quite complicated.

Â It's very hard to understand what this capital C is,

Â it's fluctuating between nlogn and then squared.

Â And it's hard to know how to get a handle on it.

Â So how are we going to go about proving this assertion, that the expectant number

Â of comparisons that QuickSort makes, is on average just O of nlogn.

Â At this point we've actually have a fair amount of experience with divide and

Â conquer algorithms.

Â You've seen a number of examples.

Â And whenever we had to do a running time analysis of such an algorithm we'd

Â write out a recurrence we applied the master method or in the worst case we'd

Â run our recursion tree to figure out the solution at our recurrence so

Â you'd be very right to expect something similar to happen here.

Â But as we probe deeper and we think about QuickSort

Â we quickly realized that the master method just doesn't apply, or

Â at least not in the form that we're used to, the problem is two fold.

Â So first of all the size of the two sub-problems is random, right?

Â As we discuss in the last video, the quality of the pivot

Â is what determines how balanced the split we get into the two sub-problems.

Â It could be as bad as a sub-problem of size 0 and one of size N minus 1.

Â Or it could be as good as a perfectly balanced split into two sub

Â problems of equal sizes but we don't know.

Â It's going to depend on the random choice of the pivot.

Â Moreover the master method at least as we discussed it required

Â solved subproblems to have the same size and

Â unless you're extremely lucky that's not going to happen.

Â In the QuickSort algorithm.

Â 8:04

It is possible to develop a theory of recurrence relations for

Â randomized algorithms and apply that to QuickSort in particular.

Â But I'm not going to go that route for two reasons.

Â The first one is't really quite messy.

Â It get's pretty technical to talk about solutions to recurrences for

Â randomized algorithms.

Â Or to thing about random recursion trees, both of those get pretty complicated.

Â The second reason is,

Â I really want to introduce you to what I call a decomposition principle.

Â By which you take a random variable that's complicated, but

Â that you care about a lot.

Â You decompose it into simple random variables,

Â which you don't really care about in their own right, though it's easy analyze.

Â And then you stitch those two things together using linearity and expectation.

Â So that's going to be the workhorse for our analysis of the QuickSort algorithm.

Â And it's going to come up again a couple times in the rest of the course,

Â for example, when we study hashing.

Â So to explain how this decomposition principle applies to QuickSort in

Â particular.

Â I'm going to need to introduce to you the building blocks, simple random variables.

Â Which will make up the complicated random variable that we care about,

Â the number of comparisons.

Â Here's some notation.

Â Recall that we fixed in the background an arbitrary array of length n and

Â that's denoted by capital A.

Â 9:28

So let me tell you what zi is not.

Â What zi is not, in general,

Â is the element in the ith position of the input unsorted array.

Â What zi is, is it's the element which is going to wind up in the ith

Â element of the array, once we sort it.

Â Okay, so if you fast forward to the end of a sorting algorithm and position i,

Â you're going to find zi.

Â So, let me give you an example.

Â So suppose we had just a simple array here,

Â unsorted with the numbers 6, 8, 10 and 2.

Â Then z1,

Â well that's the first smallest, the one smallest, or just the minimum.

Â So z1 would be the 2, z2 would be the 6, z3 would the the 8 and

Â z4 would be the 10, for this particular input array.

Â Okay, so zi is just the ith smallest number.

Â Whatever it may lie on the original unsorted array, that's what zi refers to.

Â 10:26

So we already defined the sample space.

Â That's just all possible choices of pivots the QuickSort might make.

Â I already described one random variable,

Â the number of comparisons that QuickSort makes on a particular choice of pivots.

Â Now I'm going to introduce a family of much simpler random variables.

Â Which count merely the comparisons involving a given pair of

Â elements in the input array, not all elements, just a given pair.

Â So for a given a choice of pivots, a given sigma, and for

Â given choices of inj, both of which are between 1 and n.

Â And so we only count things once, so

Â I'm going to insist the i is less than j always.

Â And now here's a definition, my xij and this is a random variable, so

Â it's a function of the pivots chosen.

Â This is going to be the number of times that zi and

Â zj are compared in the execution of QuickSort.

Â 11:22

Okay, so this is going to be an important definition in our analysis.

Â It's important you understand it.

Â So, for something like the third smallest element and the seventh smallest element.

Â xij is asking, that's when i equals 3 and j equals 7, x37 is asking

Â how many times those two elements get compared as QuickSort proceeds.

Â And this is a random variable in the sense that if the pivot choices

Â are all predetermined, if we think of those being chosen in advance.

Â Then there's just some fixed deterministic number of times that zi and

Â zj get compared.

Â So it's important you understand these random variables xij, so

Â the next quiz is going to ask a basic question about the range of values

Â that a given xij can take on.

Â So for this quiz we're considering as usual some fixed input array.

Â And now furthermore fixed to specific elements of the input array.

Â For example, the third smallest element,

Â wherever it may lie, and the seventh smallest element, wherever it may lie.

Â Think about just these pair of two elements.

Â What is the range of values that the corresponding random variable

Â xij can take on?

Â That is what are the different number of times that a given pair of elements might

Â be conceivably get compared in the execution of the QuickSort algorithm?

Â All right, so the correct answer to this quiz is the second option.

Â This is not a trivial quiz.

Â This is a little tricky to see.

Â So the assertion is that a given pair of elements,

Â they might not be compared at all.

Â They might be compared once and they're not going to get compared more than once.

Â So here what I'm going to discuss is why it's not possible for

Â a given pair of elements to be compared twice during the execution of QuickSort.

Â It'll be clear later on, if it's not already clear now, that both 0 and

Â 1 are legitimate possibilities.

Â A pair of elements might never get compared and they might get compared once.

Â And again, we'll go into more detail on that in the next video.

Â But why is it impossible to be compared twice?

Â Well think about two elements, say the third element and the seventh element.

Â And let's recall how the partition subroutine works.

Â Observe that in QuickSort, the only place in the code

Â where comparisons between pairs of input array elements happens.

Â It only happens in the partition subroutine, so

Â that's where we have to drill down.

Â So what are the comparisons that get made in the partition subroutine?

Â Well, go back and look at that code.

Â The pivot element is compared to each other element in

Â the input array exactly once.

Â So the pivot just hangs up in the first entry of the array.

Â We have this for loop, this index j which marches over the rest of the array.

Â And for each value of j,

Â the jth element of the input array gets compared to the pivot.

Â So summarizing, in an invocation of partition,

Â every single comparison involves the pivot element.

Â So two elements get compared if and only if one is the pivot.

Â All right so let's go back to the question.

Â Why can't a given pair of elements of the input array get compared two or

Â more times?

Â Well, think about the first time they ever get compared in QuickSort.

Â 14:12

It must be the case, that at that moment we're in a recursive call

Â where either one of those two is the pivot element.

Â So if it's the third smallest element or the seventh smallest element.

Â The first time those two elements are compared to each other,

Â either the third smallest or the seventh smallest is currently the pivot.

Â Because all comparisons involve a pivot element.

Â Therefore, what's going to happen in the recursion,

Â well the pivot is excluded from both recursive calls.

Â So, for example, if the seventh smallest element is currently the pivot, that's not

Â going to be passed on the recursive call which contains the third smallest element.

Â Therefore if you're compared once, one of the elements is the pivot and

Â they'll never be compared again,

Â because the pivot will not even show up in any future recursive calls.

Â So let me just remind you of some terminology.

Â So a random variable which can only take on the values 0 or

Â 1 is often called an indicator random variable,

Â because it's just indicating whether or not a certain things happens.

Â So, in that terminology, each xij

Â 15:11

is indicating whether or not the ith smallest element in the array and

Â the jth smallest element in the array ever get compared.

Â It can't happen more than once, it may or may not happen, and

Â xij is 1 precisely when it happens.

Â So that's the event that it's indicating.

Â Having defined the building blocks I need, these indicator random variables,

Â these xij's.

Â Now I can introduce you to the decomposition principle as applied to

Â QuickSort.

Â So there's a random variable that we really care about,

Â which is denoted capital C, the number of comparisons the QuickSort makes.

Â That's really hard to get a handle on, in and of itself, but

Â we can express C as the sum of indicator random variables, of these xijs.

Â And those we don't care about in their own right, but

Â they're going to be much easier to understand.

Â 16:01

So c, recall, counts all of the comparisons between pairs of

Â input elements that QuickSort makes, whereas an xij only counts the number.

Â And it's going to be 0 or 1, comparisons that involve the ith smallest and

Â the jth smallest elements in particular.

Â Now, since every comparison involves precisely one pair of elements, some i and

Â some j with i less than j, we can write c as the sum of the xijs.

Â So don't get intimidated by this fancy double sum.

Â All this is doing is it's iterating over all of the ordered pairs, so

Â all of the pairs ij, where i and j are both between 1 and n and

Â where i is strictly less than n.

Â This double sum is just a convenient way to do that iteration.

Â And of course, no matter what the pivots chosen are, we have this equality, okay?

Â The comparisons are somehow split up amongst the various pairs of elements,

Â the various is and js.

Â Why is it useful to express a complicated random variable as a sum of simple random

Â variables?

Â Well, because an equation like this is now right in the wheelhouse

Â of linearity of expectation, so let's just go ahead and apply that.

Â Remember, and this is super, super important, linearity of expectation says

Â that the expectation of a sum equals the sum of the expectations.

Â And moreover, this is true whether or

Â not the random variables are independent, okay?

Â And I'm not going to prove it here, but you might want to think about the fact

Â that the xijs are not, in fact, independent.

Â So we're using the fact that linear expectation works even for

Â non-independent random variables.

Â Again, why is this interesting?

Â Well, the left hand side, This is complicated, right?

Â This is some crazy number of comparisons by some algorithm on some arbitrarily

Â long array.

Â And it fluctuates between two pretty far apart numbers n log n and n squared.

Â On the other hand, this does not seem as intimidating.

Â Given xij, it's just 0 or 1, whether or not these two guys get compared or not.

Â 18:32

And of course, this 0 part, we can very satisfyingly delete, cancel.

Â And so, the expected value of a given xij

Â is just the probability that xij = 1.

Â And remember, it's an indicator random variable.

Â It's 1 precisely when the ith smallest and the jth smallest elements get compared.

Â So putting it all together, we find that what we care about.

Â The average value of the number of comparisons made by QuickSort on this

Â input array is this double sum, which literates over all ordered pairs,

Â 19:33

So what's going to happen next is that in the second video for the analysis,

Â we're going to drill down on this probability,

Â probability that a given pair of elements gets compared, and we're going to nail it.

Â We're going to give an exact expression as a function of i and j for

Â exactly what this probability is.

Â Then in the third video, we're going to take that exact expression,

Â plug it into the sum, and then evaluate this sum.

Â And it turns out the sum will evaluate to O of n log n.

Â 19:59

So that's the plan.

Â That's how you'll apply decomposition in terms of 0, 1 or

Â indicator random variables, apply linearity of expectation.

Â In the next video, we'll understand these simple random variables, and

Â then we'll wrap up in the third video.

Â Before we move on to the next part of the analysis, I do just want to emphasize

Â that this decomposition principle is relevant not only for QuickSort, but

Â it's relevant for the analysis of lots of randomized algorithms.

Â And we will see more applications,

Â at least one more application, later in the course.

Â So just to kind of really hammer the point home,

Â let me spell out the key steps for the general decomposition principle.

Â 20:56

Now you're in the wheel house of linearity of expectation,

Â you just apply it, and you find that what it is you care about,

Â the average value of the random variable y is just

Â the sum of the probabilities of various events.

Â That given xl, random variable is equal to 1.

Â And so the upshot is to understand the seemingly very complicated left-hand side,

Â all you have to do is understand something, which in many cases,

Â is much simpler, which is understand the probability of these various events.

Â In the next video, I'll show you exactly how that's done in the case of QuickSort,

Â where we care about the xijs,

Â the probability that two elements gets compared.

Â So let's move on and get exact expression for that probability.

Â