Well, we've got a working program and first thing we can do to study its performance is to run it, in some kind of controlled way. So, in order to do that, we're going to need some kind of representative inputs. So, one thing that we could do is to collect the actual input data that we want to run the program on, or another one that we'll use for now is to write a program that'll generate representative inputs. Either option is fine with option two in this example, we have a little bit more flexibility. So, here's a program that's going to generate input for a ThreeSum program. All we do is we take a value M which is the range of our integers from the command line, and then we take N from the command line, and then we simply print integers. They're uniform randomly distributed between minus M and M. So, if we run this with value of M value of N we get N integers in the range minus M to M. This provides a good flexibility for example, if we take M equals a million this is 10 integers between minus million and million. That kind of input even if we have a lot, we're going to have not much chance of getting a ThreeSum which might be representative of some problem that we're solving. On the other hand, if we make the range small, then we get a lot of integers. This is between minus 10 and 10. Ten integers if we minus 10 and 10, we're going to have a very good chance of getting a ThreeSum. So, there'd be lots of ThreeSums and maybe that models some other input situation. So, with this input generator, we could do a lot of testing of our program, not just for performance but also for debugging. So, that's always the first step. Is to try to think about what the inputs are going to be like and write a program that you can use to generate inputs for your program to test it. So, first thing we can do we've got a running program is we'll just run it. We'll just run some experiments. We'll start with moderate input size, measure and record the running time. So, on to be large enough that we can observe the running time, and then doubled the input size, and repeat, and see what happens. Then when we're done with that after awhile, it'll get very slow. We can tabulate and plot the results. So, the way that we measure the running time is add calls to Java's current time in milliseconds function, which gives us the current time in milliseconds divided by a thousand gives the current time in seconds. So, before we call our function, we save the current time in a variable start. After we call our function, we save the current time at that point in another variable now, and then now minus start, is the amount of time it took to call for our function to compute the number of triples whose sum to zero in our array. So, for example, if we run for 1,000 numbers between minus a million and a million. It finds 59 triples and takes not a measurable amount of time in seconds. But if we doubled the input size to 2,000, then it finds 522 triples but that takes four seconds. We double it again, finds about 4,000 triples takes 31 seconds. Double it again, now it's taking a substantial number of seconds. So, that's about four minutes and finds 31,000 triples. So, now we're ready to see what the plot of these results looks like. So, we just plot time versus our problem size and we get this curve which is the empirical analysis the result of experimenting with our program. Now, I just want to point out that there's no excuse for not doing experiments with computer programs because the cost of doing experiments is almost free compared to other sciences. If you're doing a chemistry experiment, or dissecting an animal, or setting off an atom bomb, those are generally very expensive experiments. But a couple of lectures ago, we did lots of experiments just to study ourself avoiding walk paradigm. So, whereas they only get to do one experiment and we got to do a million even in an early lecture. There's no excuse not for doing experiments to understand costs in computer science. So, just want to reinforce that point, we're really I have the luxury of doing lots of experimental work to try to understand what's going on and not much cost. So, now let's analyze our data. Again, we're in better shape than a lot of scientists because we can generate really a lot of data if we want to. So, what we're going to do is we're going to use a log-log scale. So, we'll see how that works in just a second. This is actually common in trying to understand experimental data. So, we have our N and the amount of time that it took. Then we just compute the binary log of both of those, and then we'll plot those numbers in the rightmost two columns. When we do that, we get a straight line. So, this gives us a way to fit a curve that these points fit. So, if the points are on a straight line in the log-log plot, it means that there's going to be a curve of the form aN to the b for some constants a and b. This is really much simpler than it looks. In this case, the straight line is of slope three. That means that the exponent is three. So, the running time is proportional to a constant times N cubed. Then we can do is use the data to solve for a. So, here's the math and this one. So, the fact that the line is of slope three means that log of the time is equal to log of a plus three log N. So, the X-intercept is that constant log a and that's just the equation for a straight line of slope three. So, if we take two to a power of both sides, you get T sub N equals aN cubed. So now say for 8,000, we plug in T sub N 248 times 8,000 cubed, and you solve for that you get a equals 4.84 times 10 to the minus tenth. So, that means that we think that from doing this math, that the data, that time to solve the problem for input of size N is going to be about 4.84 times 10 to the minus tenth times N cubed. We can plug our values of N back in to that curve and see that sure enough, we get really exact fit of the data. So, that's a lot of information. Our experiments give us an idea of what the running time might be even for values of N that we didn't run experiments for. So, the scientific method that's what we say is let's form a hypothesis, that's what the running time is going to be no matter what N is. So, for example, we could use that to predict. So, if we double the next time, then we can plug in N equals 16,000 and we're saying we think that it's going to take 1,982 seconds to solve the thing for N equals 16,000. So, that's quite a bit of time. That's half an hour or so, and we could actually do this math while we're waiting for the thing to run for 16,000. So, sure enough, if we run it and go out to lunch and come back, it takes 1,985 seconds. So, that's nice verification that our hypothesis might be correct. Now bolstered by that we can say okay, how long is it going to take for N equals a million? The answer is well, it's going to take 484 million seconds. You can just type that in your search window nowadays, and they'll tell you that's about 15 years. So, by running some experiments, we're able to predict and then now we have a good idea of how long our program is going to take for this value of N equals a million. So, the question then can you use this program to solve a problem for a million points? The answer is not unless you're willing to wait 15 years. There's just one other hypothesis that I want to mention. If you ran the site and actually run this experiment, particular experiment in the 70's. But if you were to do so on an old computer that's a much slower, you could do smaller values of N and you get a different formula because the computer is slower, way slower. So, this is what it might have taken on a VAX 11 in the 1970's, is five times 10 to the minus six times N cubed seconds. Whereas the experiment we just talked about, the computer is 10,000 times faster and is 10 to the minus constant 10 to the minus tenth seconds. But the curve looks the same. So there's a hypothesis in there that we're going to think that the running times on different computers generally are going to differ only by a constant factor. So, this analysis doesn't depend so much on the computer, it's only the constant that depends on the computer. We'll come back to that idea a little bit later when we look at mathematical analysis to try to validate these kinds of results.