In this lecture, I'm going to talk about how we can use random testing to automate the test generation process. And thanks again to Greg Gay for providing some of the slides. So with random testing, it really is like this arrangement of pins, it's like a box of pins, you toss them up in the air, and they land all over the floor, and you use those as the inputs to your program. So, we're just going to choose values randomly out of, if you will, thin air and check to see what the program does. And this seems like a kind of strange way to go about testing your program systematically, but it turns out to work actually pretty well. And the power of random testing is driven by the increase in computing power that we've seen over the last 20 years. So, suppose you had a one in a million input combination that leads to failure. Well, given how fast CPUs run how long do you think it will take a computer to find that, if we just chose values at random. Not very long at all, maybe a handful of seconds. And the interesting thing is that if users are well-behaved, the probability of failure in practice is likely much lower than this. But if you choose values at random, you're actually more likely to discover failures for unexpected situations that your program isn't conditioned to handle, because they don't match what users would use as inputs. If tests run quickly then this random testing will find problems in seconds, and you've already seen examples of this when we looked at QuickCheck. And moreover, random testing will find these demonic sequences that non-malicious users would likely not choose. So when you're testing your program systematically, the problem that you tend to have as a tester is you think like a good user, you tend to write test inputs that correspond to common, common cases in the program. But when we're looking for things like security bugs, what we really want to do is we want to act like bad users, like malicious hackers or other people that are trying to break the system. And random testing plays it fair, it acts like a good user part of the time, and a bad user part of the time, but it always tends to find unusual sequences of inputs that cause problems. So, some benefits of random testing are that it is extremely fast, the computer can generate random inputs essentially for free, and so we can run millions of these through in relatively short amount of time. The other thing is that it's incremental. So if you find problems, you can choose to keep and throw away test cases to increase your coverage, but very, very quickly you're likely to get a large amount of coverage that requires no planning. You just turned the million monkeys loose. It's easy to implement, it's easy to understand, and all the inputs that the system can choose are considered equal. So, there's no potential for you to bias the test case selection in a particular way. So, what about the weaknesses of random testing, it can't always work or that's all it would do. So, what if the odds of reaching a program statement aren't one in a million they're one in a trillion, 10 to the minus 9. Now, we're looking at amount of test generation time of a few days, if we're lucky. And it turns out, when we look at realistic programs, it's very easy to get very small odds, much beyond 10 to the minus 9. So, let me give you an example of a tiny little program. So all this program does is it runs through 10 times, it gets input a 32 bit integer, and it checks to see whether that input is equal to 1. If it is equal to 1, then it increases the counter, and if that counter equals 10 that is that every input that we read is equal to 1, that exposes a bug. Now, what do you think the odds of reaching this bug are? I guarantee you're probably thinking of a value that's bigger than it should be. The odds of hitting this bug randomly are actually 10 to the minus 96, which is larger, if you were to flip it, than the number of atoms in the universe. And so the time it takes to find this test might be longer than say the heat death of the universe. It is really, really unlikely that we're going to hit this bug by typing in random inputs, even for this seven-line program. But, what if we know that the real possible values are only 0 and 1. Well, then the odds of hitting this bug are one in 1000 approximately, and we'll find it almost immediately. So, knowing something about the domain of inputs makes a big difference on how effective random testing is. But oftentimes, we don't know the real distribution of inputs and in that case we're stuck. Though we do have a few possibilities for making it more effective so we'll talk about here in a few slides. But if we do know the domain, then we can use something called input generators. And what these allow us to do is to make random testing more effective by focusing the search. So, what things can we know about the domain? Well, we could know the distribution. So, we know that these inputs are normally distributed around 0. So, it's very, very unlikely that we're going to get an outlying value that's say negative 10,000, or positive 10,000. If we know something about the maximal range, like these values can only be between 1 and 10, then rather than searching through them, I don't know, I think it's some number of billions anyway, four billion in 32 bit integer. We're only searching through 10 values, maybe we know something about mode. So, the values can only be true if I'm in this mode, but they can be true or false when I'm in this mode. So, we can tailor the test generation to be reactive to the program's state. Also if we know something about the likely region containing faults. So, suppose we know that for a particular variable the fault seemed to occur between negative 10 and negative 20. Well, in that case, we can tailor the test generation to look at those values in specific. So, given these input constraints, we have fewer values to choose from, and if we know that's where the faults lie, or that's the only values the programs are supposed to accept, we can make random testing much more effective. These input generators are the mechanism that allow us to tailor the inputs for a particular problem, and we'll study this in more depth when we look at JUnitQuickcheck, and the testing assignment you have. But what if we don't know the domain? Now, we're back where we started, and we have an issue whereby we're going to be looking through an enormous space and we really don't know where to go. So for these kinds of problems, we developed something called adaptive random testing. So, in adaptive random testing, or just testing in general, we have a set of inputs, so input elements, I'm going to call it Ie, that run through a program, and generate output elements, or output test results, Oe, and we can think of these as forming a space. So, we have a giant space of possible inputs. And we can generate some in one corner of the input space, and generate some in another corner of the input space. Some of the inputs that we generate are going to pass, some are going to fail, and we have a hypothesis that faults are sparse, if we look across the space of all inputs, but dense in some parts of the space where they appear. So the idea is, maybe as in the previous example, the program is faulty from negative 20 to negative 10, but behaves correctly across all other values in the domain. So given this kind of distribution, we have a very small dense region across a very sparse space. So we want to do, since if an input causes failure, a similar input is likely also to cause failure, is we want to make sure that the inputs we choose aren't too close to one another. So, failing inputs form these small contiguous regions in the input space, and it's similar for most test goals. So if we have a particular input, and we choose a value that's right next to it, it's likely to take the same path, for example. So it's not just faults, it has to do with coverage, and other kinds of metrics that we're going to study or we have studied. And what we want to do is we want to find regions that meet our goals. So we want to do is, if we were to think about partitioning the space, that we explore each of the partitions as quickly as possible. So, what adaptive random testing does it says rather than choosing test completely at random, we're going to favor diverse inputs. So, if a test doesn't meet a goal, so the goal being, say, finding a fault, the new input we choose should be very different than the used input. So, we're going to try and hop around the space, and make sure that we fill it up somewhat uniformly. The idea with random testing, rather than having a fixed partition of the input space, is that as we continue to choose more values, we're going to, de facto, create a partitioning that's going to be tighter and tighter together. So, we're going to have a mesh, if you will, of values. And if we let the random approach run longer, those values are going to tend to become closer together. So we get, if you will, an iterative partitioning of the system. And what we want to do is we want to reduce the size of the test suite while preserving fault finding. If this hypothesis that faults are grouped together to some degree tightly across a sparse space, then it's likely that if we choose diverse points, we're going to have better fault finding with fewer tests. And that is what is done by this ART, adaptive random test algorithm. So we have a fixed-sized candidate set. What we do is we say, "Okay, we're going to fix the size of the suite." And we want to move things around, so that we get a diverse suite. So we have two sets of test cases, an executed set, the set of already executed tests that didn't meet our goal, and a candidate set. So these are a set of random tests that have been generated but not yet chosen. So what we're going to do is we're going to generate a test and execute it, then generate N candidates. So, just randomly as we would with the regular random approach, and we're going to choose the test furthest from the executed test, and execute that, and add it to the set of executed tests. And so, the interesting thing here is that we have a performance trade-off. The great thing about pure random testing is that we can run through umpteen million tests very, very quickly, just as fast as the computer can generate random numbers, and run them through the program. But in this case, we're actually doing a little bit of reasoning up front. So, the idea is that we're going to take an existing test, generate a bunch of candidates, and choose the one that we think is best, and by best, we usually mean farthest away. But the idea here is that the generation process is very, very fast. So, even if we have to generate 10, 20, 30 candidates, that's just calling a random number generator. Running them through the program takes significantly longer. So we can afford to be a little bit choosy in how we generate our random tests, and still come out ahead. So, in theory, if this dense fault hypothesis is true, ART should find a solution faster than pure random testing, and in practice, it holds true. It tends to find a similar quality solution of pure random in about half the time. And there's been a bunch of studies about how many tests should be generated before choosing one. The original authors found that generating around 10 offered the best effect in this versus performance. But there's been a variety of different research in this area. There is a whole literature if you're interested. And some of those pieces of literature will be in the notes. So just to recap, what we're trying to do with random testing seems a bit weird. We really are trying to take a million monkeys and have them write Shakespeare. The idea is that with a modern computer system, you can run through millions of tests very, very quickly. And if problems are randomly distributed across the input space, you're likely to find them. Many of the bugs that we have are only relatively rare, one in a million, say. And computers are often fast enough to run through millions of tests quickly. One thing that we try and do to improve our odds is we use adaptive techniques, so we don't end up choosing, which can happen even with randomization, too many points that are close to one another. Now, when we get to extremely rare paths, random testing can't help us. And so, we'll be looking at different techniques that are better suited to find these extremely rare paths in the next couple of lectures.