Next, we'll do a complete example of the analysis of algorithms. A classic analysis of perhaps the most widely used sorting algorithm, Quicksort. So this is the code for Quicksort right out of Section 2.3 of our Algorithms book. And I encourage you to download that code from our book site, and I'll talk more about just how to do that later on. It's a recursive method down at the bottom, the sort method, that is based on doing a process called partitioning, which is the third line from the bottom. A partitioning takes an array to be sorted and divides it into two pieces out of position j with the property that every element with an index less than j is less than a(j), and every element with an index bigger than a(j) is bigger than a(j) or maybe equal. So it partitions the array into those two parts, and if an array has that property, then you can get it sorted simply by making the two recursive calls that do this sort, sort the left half, sort the right half. The partitioning process is handled by running two indices, one i and one j, i going from low to high and j going from high to low. And we simply scan from the left, incrementing i as long as we have elements that are less than the partitioning element, which we put it a to low to start it up. And then scanning from the right, as long as we have bigger ones, if we found a small one on the right and a big one on the left, we exchange them. And then we keep going until those pointers cross at which point we put the actual partitioning element into the place. If you're not familiar with that, read about the details in Algorithms 4th edition or on the book's site, that's a quick review. So we have to make some preliminary decisions when we're going to study that kind of code. And so the first thing is the cost model, and cost we're going to talk about is the running time. But really, it's better to come up with an abstraction. We're not really talking like about microseconds or seconds. What we want to do is talk about something abstract. And for sorting algorithm, what it is is it compares, what the algorithm is doing most often is comparing two items. So let's just count compares. And then the work we do for counting compares, that takes us from time to just to discrete counting process, and that will mean that our analysis is valid for any kind of machine at all. We get the frequency of execution that compares them, and then the time is just how quickly can you do a compare. That all comes rolled up into a constant. So our hypothesis is going to be, if the number of compares is C, then the running time is till the -aC on any computer. Now a is going to be different for different computers or different systems or different languages, but it's a reasonable hypothesis, and it bears out in many, many practical situations. So, that's a big decision at the beginning to get away from time and get into something abstract associated with the algorithm without losing the ability to make some scientific studies later on. So the input model, we're going to assume the input to be randomly ordered, and as I mentioned, in the case of sorting algorithms, that's pretty easy to arrange. We just randomly order them. Also, just to make the math simpler in this first cut, we're going to assume the keys to be all different. That's not always so easy to arrange, and I'll have more to say about that in a little later. So it's a key question, are these assumptions realistic, and I think the answer is, we'll get into that. But we made these assumptions and these models with the knowledge that we're going to be able to test them scientifically. And we can go ahead and do that, and that's part of the process that I'll describe. So the bottom line from this slide is that we're going to count compares, and we're going to assume that the keys are randomly ordered and distinct. So in that context, now there's some relevant questions about this quicksort code. So we're going to assume array of size N, entries distinct, and randomly ordered. So first question is, let's break out the partitioning process. So how many compares are involved for this partitioning process? Well, that doesn't take much analysis to see that what happens is that every item in the array is touched in order to get the partitioning done. Actually, there's one extra because our partitioning elements cross, so it's N + 1. So, that's one relevant fact. Now, if they're randomly ordered, what's the probability that the partitioning element is the kth smallest for any particular value of k? Well, if they're randomly ordered, then it's going to be the same for each values of k. It's going to be 1/N, no matter what k is. So that's another relevant fact. Now what's the size of the subarrays if the partitioning element is the kth smallest? Well, everybody to the left is less so there's k-1 of those. And everybody to the right is bigger. There's N-k of those. Add those together, you get N-1, and then this partitioning element gives N. So, that's a relevant question that we need to know the answer to, what's the size of the subarrays in that case? And then there's another detail that takes a little bit of thought, and it was a problem in older implementations. But you need the subarrays to be, if they were randomly ordered before, are they randomly ordered afterwards. And this implementation doesn't do anything that would disturb that, so the answer to that is yes. So with these questions, we can transfer the problem of counting the number of compares needed by quicksort to pure mathematical formulation of the problem. So it's a recursive program that has a random input model. And that leads to a formula called a recurrence relation. And we'll look at recurrence relations in detail later on. We've got N entries, distinct and randomly ordered. If we let CN be the expected number of compares used by quicksort, then we can write down an equation for CN, that is [COUGH] a mathematical model that reflects the recursive model of the program itself. You need N+1 compares to do the partition, so that's the first thing. Then for every possible value of k, that value is the partitioning element with probability 1/N, and then it leaves two subarrays to get sorted afterwards. And those are randomly ordered, so the cost of doing the left subarray, which is a size k-1, is just Ck-1. And the cost of doing the right subarray is C sub N- k. So that's a formula that describes the running time, that number of compares use by quicksort. There's one more detail, if there's only one element, we don't do anything So we have to make sure that we define the small values of n appropriately. But what this has done for us is taken our program and our abstract problem and so forth and reduced it to a mathematical problem. We have to be able to solve mathematical problems like this. And so that's what we're going to go into next. So, I'm going to do it for this specific recurrence, and then in a couple of lectures we'll talk more generally about how to solve problems like this. And that's a mathematical problem, CN = N + 1, and how are we going to possibly approach something like this? Now, in this one, there's a couple of tricks that get the job done. And, I'm not going to try to explain the origin of those tricks. What I want to show is how simple the calculation is. But we'll get this solved in just a couple of slides. So first thing to notice is the sub files are really kind of the same. And actually, if you kind of write this out in different notation, both of them are c zero plus c one plus dot dot dot plus cn minus one. That is the subsequent files could be any one of those sizes with equal probability. So the two sums are the same. So you could reduce it to just one sum with a factor of two over n and the one over n is not inside the sum. So that's a simpler form right there, just the observation either just totally mechanically that the two signs are the same or you can argue more broadly that there is no difference between the two sub files. So, you can start with this one if you wanted. They are symmetric. Now the next thing that is complicated is this annuity in the denominator. So, what we do is multiply both sides of the equation by n. And so, that's just multiplying by N. There's no problem there. Now, the next trick is to get rid of the sum. And the way we're going to get rid of the sum is, write down the formula for N-1, and then subtract. So if we do that on the left, we get N C N-(N-1) C N-1. Same formula for N-1 on the left. And then on the right, this term N(N-1)- (N-1)N, just leaves 2N. So that's a little algebra. And then what's the reason we did it is if you take this formula for N, and subtract the same formula for n- 1 all that's left is just one term on the right the 2cn- 1. For now we have a recurrence relation that has no so many and that's clearly going to be much simpler to deal with. And then just collecting terms, we have this very simple recurrence relation, describing the number it compares taken by quick sort at the bottom. It's not a solution yet, but it's way simpler than what we started with, and certainly way simpler than scratching our head, looking at the program, trying to understand how many compares that it takes. And so, actually, this is a pretty good milestone, in the analysis of this algorithm. Because we've got now, a pretty efficient algorithm for computing the result. If we try to compute the result from the original recurrence, it would take quadratic time. So, that's the code. And I'll talk more about using code to learn about recurrences. But in order to do N, you have to do a double for loop, you have to for every N, you have to do for every k quadratic time. So, we can use that to try to estimate the value of c[N] for say N equals a million >> We don't have a million squared cycles, even on fast computers today. Or a billion or a trillion. People are trying to sort files that much, so we would want to be able to compute it. But that original thing is not good enough to do it, but with the manipulations that we just made we have a linear time algorithm for computing it. Just started at zero and then just use this formula to compute later. [COUGH] Values, so we might not know too much about what it is, but we know enough to calculate the numbers of compares for any N just by running this simple program. And it is interesting to think of really, I talked about suppressing detail, really, what we want is a fast way to compute the running time of a program. And this is a good. There's a lot of algorithms we'd like to be able to get this far on. Actually, we're going to have much, much faster way. We just want to compute it with just evaluating a few elementary functions. So let's look at how to do that next. Now we're going to actually solve it. So we're going to express this quantity in terms of familiar elementary functions. How are we going to do that? Now the first step is a magic step. It's kind of tricky but actually there's solid motivation for it that we'll talk about later on. So we're going to take both sides of the equation and divide by N times N plus 1. So on the left CN divided by N times N+1, is just CN over N+1. On the right, the N+1s cancel. We get CN-1 over N. And then 2N over N times N+1, is just 2 over N+1. That looks like a little bit more complicated form. But it's actually a much simpler form because the first term on the right is the same as the term on the left, it's just within reduced by one. And what that means is we can apply this same formula to that term. And so if we apply that same formula to that term then so that's cn+1=cm2x1 If we plug in CN- 1 = C N-2 over N-1 plus 2 over N, then we have that formula. And then we can continue doing that until we get down to our starting point, C1. And then, there's no unknowns on the right-hand side, and that's a solution. So, [COUGH] It's a simple version of this solution. There's a small constant term left out. It's that C N is tilde 2N times the sum from k from one to N over 1/k -2N. And that's just running that out, multiplying by N+1, and doing the sum. That's not a simple version with no unknowns on the right. Now this looks more complicated than what we started with because its got a sum back in there so what do we do about that. Well sums often, simple sums we can approximate with integrals. And again, this may be familiar to many of you, but certainly we'll go into the math behind approximating sums with integrals. So this is Close to integral of 1/x dx plus gamma which is Euler's constant which is about 0.57721. And that gives us a formula for c n. With, in terms of familiar mathematical functions. It's 2n, natural log n, minus this constant time n. And that's Cn is a tilde of that value. Now there are approximations. And we will be talking about how to be precise about making these approximations. But other than that, this is Fairly straight forward mathematical manipulations that get us to a solution for the number of compares used by Quicksort. So now, even when we're just doing math for the analysis of algorithms, we can always check our math with a computer. And that's always worthwhile. I said I'm claiming that this is pretty close to CN. Well, I can just write a program to compute this. That's 2*N*Math.log in Java, is natural log of N minus, and look up Euler's constant. That's a value. And I can also, as I mentioned, compute the actual values of CN. And then I can just print out a table. And sure enough, that's a check that we're within 30 for a million. That's pretty close. And we could actually get close if we wanted. That's the first thing that's wonderful about the analysis of algorithms, even for beginners, is that you can always check your work. It's very specific what this thing is. And you're saying, I think I have something that's close to it. Well, you can check and then you can know. And then you can have confidence to move on. So that's the first thing. So with that, we've checked the math. But there's one other thing that we have to check, and that's to check the model. And so we thought that if it's randomly ordered distinct keys, that should be the running time. So what we'll do is, we'll run the code. And this is 1,000 trials for each value of N, N from 1 to, I don't know, a 1,000, I think. And there's a gray dot for each trial in the spot. You can hardly see the gray dots, with a red dot right in the middle, which is the average for each N. So that's what the experiment gives us is actual results, which is the number of compares. And then what we want to compare that against is this function. So we can just plot that function and boom, it's right on. Run the algorithm, and here we run the algorithm maybe a million times. And that experiment shows us that the formula that we got for the number of compares is right on. So that's checking the model, and very successful in the case of Quicksort. So just as an observation, that maybe we're not just interested in the average costs. Maybe we're interested in the distribution of costs. How likely is the actual thing supposed to be, how close is the actual running time of my one sorting problem going to be to this average? And you can see from the narrowness of these gray dots is that it comes pretty close to the average. It's very typical in the analysis of algorithms. That standard deviation is a slower growing function than the average, and so it regresses to the average as N increases. But there's interesting mathematical questions, plenty of interesting and challenging mathematical questions when we start looking at problems like this. What is the distribution of costs? Most people would say well maybe it's a normal distribution, maybe it's a bell curve. But it's not normal. And actually it was only ten years ago. After Quicksort was discovered in 1960, and people were looking for a long time to try to understand is it normal. And that's actually not normal. So that's the exact distribution from the recurrence, which we can compute, not just the probability that the average is a certain value. We can compute for every k, the probability that it takes exactly k comparisons. And then what this plot is, is taking those curves and centering them on the mean, to get an idea of what kind of curve we come close to. And it's not a bell curve, it's got kind of a tail off to the right. And actually, this is just an empirical validation of running, again, 1 million sorts just to show that the actual behavior is like that. And that's interesting mathematic. What is this curve? That's a new function. It's not a normal distribution, it's definitely worthy of mathematical study. After all, this is probably one of the world's most heavily used algorithms. We might want to understand this property of it. Maybe that'll help us understand something else in the future. So the bottom line from all of this work, and this is just a summery of the 50 years of reasearch on Quicksort. Is that really a lot is known about the performance of Quicksort. And there is plenty of interesting of research problems in the, literally, thousands of papers that have been written about Quicksort in the 50 years since it was discovered. And that's not the end of the story, I have two more comments. The first one is about prediction, and this is something that we teach in the very beginning of the algorithms book. That just using a hypothesis like this makes it simple to predict performance, even of extremely complicated algorithms in extremely complicated scenarios. So if that's our hypothesis, what we're going to do is run the experiment for some input size N, some big N. And maybe run it a bunch of times to get a good average on the running time. If we wanted to, that's enough information to solve for a. Whatever our time is for whatever N is, then a's the only unknown, and we can go ahead and compute a. But actually, we usually don't need to even do that. We can just predict the time for some constant times N, just by doing the math. So say we wanted to predict the time for 10N knowing that we have the running time for N. Well, if you just do the math, if we take a(10N) log (10N), over aN log N. The a's cancel, we don't even have to know the constant. It says that the running time is going to increase by a factor of 10, plus 1 over a log base ten of N. And this gets smaller as N gets bigger, so its going to increase the problem size by about 10. You're going to increase the running time about 10. And that's a very accurate prediction, and that's useful. And that we now teach to first year students, and in fact we teach to nearly 70% of all Princeton students this simple idea. That if you run a program for a big input, and you have a decent model for its performance, you can predict the running time. So for example, run it for a 100,000, 100 times. And again we can run as many experiments as we want nowadays, and it takes 4 seconds. So that means that I'm going to predict that it's going to take just a little more than 40 seconds for 1 million. So this is an experiment, and then I'll run it for 1 million. It takes a little bit longer, it takes 40 seconds. And I can observe the running time of 41 seconds. That gives me a lot of confidence to predict how long it would take for 1 billion, which is maybe the problem that I really need to solve. It's going to take 11 hours to solve for 1 billion. And that really, any programmer can have this kind of understanding on the performance of programs. It's all about getting that first model. If all you're interested in is prediction of running times. It's very powerful. Now, I'm not saying that we shouldn't have an accurate mathematical model. It's also important to have a mathematical model. You might want to think about the reason for that. And I'll show an example later on and you'll have an exercise. The reason is, that there might be parameters that are involved in the running time of the program, and we might have the freedom to set the values of those parameters. Even for one parameter, you'll see an example of that in the exercise. We can't afford to run experiments for every possible value of the parameter but if we have a mathematical model, then we can use the mathematical model to find the optimum value of the parameter. And that's a very strong reason to keep working towards good mathematical models. Now as I mentioned the other thing that really we should do is validate model, our model and applications. And not many researchers get down into that place, but more and more nowadays because the running time of algorithms is a very important thing to understand for people who are Trying to build new Internet companies or trying to analyze huge amounts of scientific data. So validating the model into real application is something that goes on continuously in practice by people who have knowledge of the scientific background of the performance of programs that I've been talking about. And for Quicksort in particular, it's been going on for really a long time. I'll just give you three examples from different decades that I've been involved in. Right when I got out of school I was involved in a project where we were sorting on the Cray-1, which was the world's fastest super computer at the time. And this was for cryptography. And there was an unlimited amount of data. And one of the ways that cryptographers try to understand the data was to take it and as big trunks as they could get and sort it. As many times as possible they sort it. So, you got data that's supposed to be random and if it's not random we can ring a bell and read the code. It's supposed to be random and we're sorting it as fast as we can. And so it was very important to be able to shave every microsecond off that sorting running time. And the machines of those days, and particularly the Cray-1 wouldn't even work unless every instruction took precisely the amount of time that it said in the manual. And we could work with machine language versions of Quicksort and predict the running time to within a microsecond. Which is predicting the running time for doing it a million times to within a second or within a billion times within a few hours. It's a fine example of the successful application of the analysis of algorithms. Another example came along in the 1990s, when Quicksort was used as the Unix system sort. And this code is still around and people were very interested in having a general purpose library sort now that everyone can use. What happened was that a user came along and had an application that violated one of our assumptions. He had files that had only a couple of different distinct values and and his application performed poorly. So people went in and looked more carefully and over a few years time, John Bentley, who's one of the researchers at Bell Laboratories and I, and some others, came up with ideas for addressing this situation. And those things are described In the fourth edition of Algorithms where we modify the algorithm with a better understanding of it to handle this situation. Now, to go back and do the analysis and refine the model and check scientifically required some serious research and that was only recently that we came to the resolution of how those algorithms perform in practice. But still that gave library models that gave us much more robust sorting algorithms. Whose efficiency did not depend on this idea that the keys must be distinct. And so much broader use of libraries that was only enabled through trying to get the analysis of algorithms right. Let's try to get a model that reflects what's seen in applications. And that's ongoing. Just recently I was contacted by somebody working in a college working in a networking start-up. And what they need to do is sort a billion records that are about a thousand characters each. And they had to do it fast or their competition would beat them and they'd go out of business. And he found a refinement to our three way partitioning methods for strings. That was able to improve it even further. And so, again, this is over 40, 50 years the analysis of algorithms really helping us to best understand the performance of this important problem. Now I'm not going to be talking that much about the systemy and stuff in this course but I wanted to go through this example in detail because the map is very relevant to what we do in this course, and I want to make sure that people are motivated to look into the kind of math that we're going to be talking about. So that's a full example of the analysis of algorithm that's Quicksort. And Knuth succinctly expressed the idea that I want to leave you with for this example. People who analyze algorithms have double happiness. First they experience the sheer beauty of elegant mathematical patterns that surround elegant computational procedures. Just in terms of the pure math what programs bring to the table is another level of interesting mathematics. But then, if you're in the analysis of algorithms, you get a practical payoff, because your theories make it possible to get other jobs done more quickly and more economically. So, it's not just the math, it's that the math is helping us press forward in all kinds of important and identifiable ways. So that's a full introduction to the analysis of algorithms through the study of Quicksort.