Hi. So, how do you know your tests are any good? If they find bugs they're good, right? Well, sort of, because when they stop finding bugs, there's still no way to know if you're done. So there has to be other quantifiable ways we can look at, are my tests good? And that's what we're going to be talking about. What we need to determine is whether or not the test can prove that the code is correct to some level of dependability. That's pretty hard. In fact, it might be impossible. There's no way to comprehensively test. Comprehensive testing is a theoretical testing process. You test every single possible input. And if you have a finite set of inputs, that is possible. But most programs don't have that. We have an infinite level of inclusion of inputs. If you have a double, you have the entire space of doubles, all numbers, all decimal point numbers, it's infinite. You can't test them all. So when we look at test adequacy, we're trying to decide what is the minimum level necessary to meet some level of correctness, some level of dependability that we want for our software, while minimizing the number of tests we need to run. Instead of measuring adequacy directly, we measure how well we've covered some aspects, structure, inputs, requirements. It's a proxy, a way to decide whether or not it's thorough enough based on our own past history. Each adequacy criteria means that we're trying to meet some set of obligations. A test suite will satisfy the adequacy criteria, if all tests pass and every test obligation is met. In the example, you can see statement coverage adequacy criterion. It's satisfied when the tests in our program P have executed every line of code in P by at least one test, and all the tests pass. So, what do we do when we satisfy the criterion? What do we know? It's not nearly enough to be able to say, well, I've met the criterion, therefore, my code is good. It's never going to be that level of proxy, that level of equivalence. But, possibly, we can define our adequacy criteria based on our past history and decide, well, since we identified this many bugs, since we did this level of code coverage, that provides us with at least some assurance that it's going to be good. Sometimes no test suite can satisfy a criterion, so there's a couple options. One, we can exclude the unsatisfiable obligations. So we can modify statement coverage to require execution only of the statements that can be executed. And then we try and decide why can't the ones that can't be executed be tested, and is it okay for us to do that? Otherwise, we can effectively ignore it. We can say we measured the extent to which we've approached the adequacy criteria. So, maybe it's good enough to say we satisfy 85% of statement coverage. There are a number of places where test obligations come from. We can focus on the four main perspectives here, functional, fault-based, which is what people tend to think of when they're trying to think of bugs, trying to force problems to happen. Model-based, where the focus is primarily on the logical model of the program. And then the structural, structural being things like statement coverage again, trying to map obligations to some level of coverage in the code itself. The problem with adequacy, of course, is that it can be a placebo. We can measure it. It can be useful. It can also be dangerous. We don't want to immediately assume that meeting the coverage criteria is good enough, or that the tests that meet code coverage are good. There are a lot of easy ways to trick code coverage, adding simple tests that really don't help much but do add an additional statement being run. That's not really what we're looking for. When it comes to test adequacy, a lot of times what this test adequacy measures can do is not only tests that are good enough, but also find new inputs. So, they can help you define and find additional inputs that are necessary, because you didn't test a certain level of code. The worry, of course, is with any testing is really just trying to make it over the bar, rather than doing good testing. So, that's more of a business process. It's more of a culture of doing good testing, more than just meeting the coverage criteria. We can somewhat distinguish between these. We can study the effectiveness behind them. We have a history of this code. If we run project A and project B with, let's say, Statement versus MC/DC. Then we look and see, after it's being deployed, after running all the tests, finding the bugs, once it releases, how many bugs were left?. We really can only do that in postmortem. Like so many of our elements, we need history. We need to measure these things while we're doing the work, otherwise there's nothing we can do. We can identify places where it's stronger just by pure logic, by proving that under this situation, under these set of circumstances, this criteria is going to work better than that one does. We need to be effective at finding bugs. We want it to be robust to simple changes in the program. But one of the most important ones is whether or not it's reasonable. It's very easy again to think of comprehensive testing. If we test everything, we will obviously find all the problems. That's not good enough, though. We also have to look at how much time do we have to run the test? How many tests can we run? because it's not enough to do good testing. The code coverage doesn't matter. And, unfortunately, in a lot of cases, it comes down to the money criteria, the time criteria. How much money and time do we have to spend on this? This is a very big area of software engineering research. This primer is not sufficient to gain full understanding, just like the test selection lecture. There's several other resources that exist in the space to develop further understanding, and other courses that will talk to this specific area. We want to define "thoroughness", how well our test suite executes our program or system under test. We're always looking at covering some kind of information, and the easiest one is code coverage. But there are other things, like requirements coverage, model coverage, and others.