Next we're going to take a moment to describe in detail the scientific approach that we use in modern analysis of algorithms. So again, just a note about notation. In the theory of algorithms, when they're looking at upper bounds on worst case performance, they use these notations big O, big omega, and big theta to try to capture the order of growth, and that's what they use for classifying algorithms. If g(N) = O(f(N)), it means that the ratio of g(N) to f(N) is bounded from above as N goes to infinity. And if it's omega, it means it's bounded from below. And if it's theta, it means that it's bounded from both above and below. So this one says there's a constant such that g(N) is less than that constant times f(N). Now, this one says there's two constants that it's in between. So that allows classification according to functional growth, as I mentioned, merge sort is N log N, and quick sort is N squared. So, that's the notation that you often see throughout the literature describing the performance of algorithms. But as I mentioned, big O-notation is dangerous, and I'll have more to say about that in just a minute. So it's not scientific to use the big O-notation to try to compare algorithms. If you say the running time is big O(N to the c), that's not of any use for predicting performance. It's an upper bound on the worst case. The actual performance may be much better than the worst case. And it could be that even the actual bound is less than whatâ€™s given by the big O-notation. It's fine for a first cut of classifying algorithms, but not useful for comparing. What we use instead is whatâ€™s called the tilde-notation, and so what we'll typically say is the running time of an algorithm is ~, a constant, times, say, some function of N, where N is the input size. That does provide an effective path for predicting performance, and I'll show some examples of that later on. So we don't use the common big O, big theta, omega notation very much, except in a specific technical sense that I'll talk about later on, when we talk about asymptotic approximations. So [COUGH] big O-notation is useful for a lot of reasons, and it dates back a few centuries, and we do use it in math. But it's a common error to thing that the big O-notation is useful for predicting performance. And I just want to make sure to nip that problem in the bud right away. So this is what often happens to me when I give talks around the world on this topic. Typical exchange in, say, the Q&A for my talk, depending on how formal it is, I'll say okay, big O notation is dangerous, you can't use it to predict performance. And somebody will ask or shout out, but an algorithm that is big O(N log N) surely beats when it's big O(n squared). And then I trot out and say the quick sort, merge sort example and say well, not by the definition, big O is just an upper bound on the worst case. And then they'll say, well, so use the theta notation, which says that it's in-between. And I say well, that maybe gets rid of the upper bound, but you're still typically bounding the worst case. Is your imput a worst case? And even with that logic and we've got a compelling example like quick sort versus merge sort, usually what happens is the questioner whispers to one of his colleagues well, I'd still use the N log N algorithm, wouldn't you? [LAUGH] And actually, such people usually don't program much and shouldn't be recommending what practitioners do. But surely, we can do better than this. That's part of what analytic combinatorics is all about. There's another idea that's out there as well, and that's the concept of a galactic algorithm. And I found this on a friend's blog, Dick Lipton, who said let's define an algorithm that will never be used as being galactic. And why will it never be used? Because no one would ever notice any effect about this algorithm in this galaxy. Because any savings is going to happen for an input size so large that it couldn't happen in this galaxy. So, an example of a galactic algorithm, and this one is actually maybe planetary or something. It's actually close to the real world, is Chazelle's linear time triangulation algorithm. So the problem is to find a way to triangulate a polygon in linear time. It was a surprising result and a theoretical tour de force to prove that it was possible to solve this problem in linear time. But the method is much too complicated for anyone to implement. And if anyone did implement it, the cost of the implementation would definitely exceed any savings, until N is so large that it would take another galaxy to deal with it. And [COUGH] this is an interesting situation. I think one of the problems is that so many algorithms that are out there that are being published in the literature in this category. After Lipton introduced the concept, one of the contributors to his blog estimated that something like 75 to 95% of the papers in the theoretical computer science conferences are galactic. And I think the problem is that practitioners aren't necessarily aware that they're galactic, and they maybe try to use them. When a relatively simple analysis would say, there's no point in a practitioner taking a look at this, the paper should have asterisks on them or something. So I think it's okay for basic research to drive the agenda, and there's nothing wrong with trying to find the algorithm with the best upper bound and worst case performance. But we have to do something about the common error, where people think that a galactic algorithm is actually useful in practice, there's a lot of denial out there. Here's another thing that often happens to me, this was an actual exchange with a very prominent theoretical computer scientist a few years ago. And he said in a talk, well, the algorithm A that actually is a pretty straightforward algorithm that's in widespread use is a bad algorithm. Google and other Internet providers should be interested in my new algorithm, algorithm B. [COUGH] And so in the question and answer, I said, well, what's the matter with algorithm A? And he responded, it's not optimal. Its running time has an extra log log N factor. And that's always a tip off for me, I say, well, but your algorithm is very complicated. It takes ten pages to describe. By the way, we all know that log log N is less than 6 in this universe. That's 2 to the 64th, and so if N is 2 to the 64th, it's less than 6. And so not only that, it's just an upper bound. Not only that, your algorithm is so complicated, it's certain to run 10 to 100 times faster in any conceivable real world situation. Why should Google care about algorithm B, as you said? And then the response was, well, I like algorithm B, I don't care about Google. And again, that's fine to do research for the intellectual challenge, just don't say that as some practitioner, I should be interested in it. Surely, we can do better than that. So what I want to talk about specifically is the scientific approach, say the modern rendition of what Knuth taught us. That is used for many, many algorithms in the fourth edition of my Algorithms book, which is co-authored with Kevin Wayne. So this is described in a lot of detail with examples in Section 1.4 of the book. Again, as Knuth said, we start with a complete implementation that we can test. And then therefore, we'll be able to make hypotheses about the performance of that implementation and test them. And then what we're going to do is maybe try to avoid some of the detail in Knuth. And we're going to analyze the algorithm by trying to find an abstract operation that is in the inner loop. That is, it gets executed more times than any other operations. And then we'll also need to develop some realistic model for the input to the program. That's still a sticking point, and then we'll just analyze the frequency of execution of that one operation for input size N. And our hypothesis will be that the actual running time is proportional to a constant times that frequency. That's it. So the hypothesis might be wrong, but we can test it. And we have the tilde operation. And I'll show you specifically how we can go ahead and test it. And actually, the unknown constant is an annoying thing to carry around, but not actually too bad because it does allow us to make specific mathematical calculations. So, what we'll do then is, once we have the hypothesis, we're going to validate the hypothesis by, first of all, we want to generate some large inputs according to the model. So, for example, we'll look at sorting algorithms, where the model is that the things are randomly ordered and distinct. And so it's easy to write a program to generate large, randomly ordered, distinct files. And then, we can just run the program for a large input to calculate a. And actually, we can even skip that step. I'll show you that in a minute. But we definitely can get a that way, an estimate of a. And then that gives us a model that we can use to predict performance for even larger inputs and check our hypothesis. That's the scientific approach to the analysis of algorithms. So really then, later what we need to do also is find an application that actually uses real world data and test in application context to validate the model as well. That's actually the hardest part that's often overlooked nowadays. And then as in any application of a scientific method, we'll refine and repeat and revise the model and the algorithm as necessary, or as we learn much about the problem. As I mentioned earlier, one of the great things that happens with this process is that often, we learn things about the algorithm that enable us to develop improved versions by doing this. We have many, many examples of this for sorting and searching algorithms and graph algorithms and string processing and data compression. And many, many other applications, where we successfully apply this approach to develop good implementations and understand their performance. Now what this course is going to be all about is this analysis of the frequency execution. That's where the math is. And so, but I want to give this full context, so people can understand why we're going through this trouble. So as I mentioned, we don't use the big O-notation or omega or theta that much. Instead, that's sort of the theory of algorithms. Instead, we're going to use the tilde notation. And all the tilde notation means is that two functions g(N) is tilde f(N) if their ratio approaches 1, as N approaches infinity. And usually, we have f(N) is a constant times a standard function of N. So that's the notation, so we're just using tilde and that's good. That simplifies things a bit, and then we're just left with these basic components. So, one of the basic components of algorithm analysis involves programming. So, people who do analysis of algorithms need to be comfortable with implementing algorithms, running them, and be able to at least count operations. So you need a good implementation. Now maybe somebody did the implementation, but still, you want to have experience with it on your own computer to test various hypotheses that you might have. Then we've got the mathematical challenge, and that's, again, going to be most of our emphasis in this course, where we develop a model, analyze the algorithm in the model. And what we're going to talk about in this course is that really, we need to do the math. And that's the kind of math that I'm going to be talking about very soon, in just a minute. And then there's the scientific process of running the algorithm to solve a real problem and checking for agreement with the model. And so you can't do that, you can't really compare two algorithms until you've done all the rest of this. So it's very important to have that context to understand why we're so interested in getting this math done right without excessive detail. Now there's definitely potential drawbacks still in [COUGH] the theory of algorithms, it still goes places where we can't go. Number one is that model might not have a realistic model, and that's actually a challenge in every scientific discipline. Now we do have a big advantage in computer science because we can randomize to make the model actually totally valid, and I'll show an example in just a minute. So if we don't have a realistic model, maybe we can get a realistic model by randomizing. And that's a very powerful concept that if you don't get if you're doing science that involves going to the moon or killing monkeys or something. Second thing is the math might be too difficult. That's also a big challenge in any scientific discipline, statistical physics, and many other areas, new developments in mathematics were involved in order to do the science. And that's certainly going to be it here, and really, that's what analytic combinatorics is. A calculus for analysis of algorithms is the true motivation for this course. Thirdly, it might be that the experiments are too difficult to really validate the model. And again, that's not much of a point compared to any other scientific discipline. No problem for us to run an algorithm a million times, and we do it a lot. Whereas if you had to feed a million mice, you might have some restrictions or go to a million planets or whatever else. But it is a road block for a lot of people trying to study algorithms because actually, they're trying to study galactic algorithms that might be way too difficult to implement. And if you can't implement it, why try to analyze it? And it's fine for the intellectual challenge, but again, there are people out there that are thinking that the algorithms that you're developing are useful in practice. And really, they should be validated scientifically before the poor working programmer is faced with them. So that's an outline of our approach to the analysis of algorithms.