This lecture is about testing. As mentioned in the last lecture, when we perform testing, we try to find defects and also mistakes within a software system. This are the learning objectives of this lecture. After this lecture, you will understand the purpose of testing, and you'll also know the three basic types of test cases and how they can be generated. You'll also understand the principle of component-based and also system-based testing. You'll also learn different strategies for performing the different types of test cases. We will start with testing overview. Some of you may think that if you talk about testing, we mainly do it after implementation. It's not true because if you have talked about test-driven development, we have to think about all the possible test cases, test plan before we actually perform implementation. That's why it's possible that we perform testing before we perform implementation or while we are performing implementation, not just after implementation. There are many reasons why programmers they do not want to perform testing. The first reason is that I want to get the project done fast. Testing is going to slow me down. The second reason is that I started programming when I was two. Don't doubt about my perfect code. Fourth reason is that testing is for incompetent programmers who cannot hack, it's not for me. We are not super students and we know that our code actually works. I trust the software from my teammate. That's why we don't need to perform testing. Notice that if you don't test your system thoroughly a tiny mistake may destroy the entire system. This is an example that we have seen at the very beginning of the semester, the rocket just brought down 37 seconds after launch. The reason is that when they try to convert a 64-bits floating point number into 16 bit, signed integer. The conversion cost it and exception. Then the program just crashed and then the rocket explode. The total cost over 1 billion US dollar. The Ariane 5 rocket shows particularly costly error, so it's expensive. However, every little error is going to add up, for example, for the Vancouver Stock Exchange System in 1992, the new stock exchange system initiated the index value as 1,000 and then the index was updated after each transaction and 22 months later, the index had fallen to 520. The cause is that the updated value was truncated, rather than rounded. That's why they don't get the correct result within the system. The rounded calculation would give a value of 1,098 point something. Insufficient software testing costs 22-60 billion per year in the US. If your software is worth writing, it's also worth testing because you want to make sure that it is going to be correct. In testing we try to find mistakes and errors within the software system. In testing we try to find differences between the specified and the observed system behavior. We try to find differences between the expected and also the actual system behavior. There are two major things that we have to do within testing. One is what we call validation. In validation, we try to make sure that we have built the right product according to all the requirements stated in our documentation. Acceptance tests deal mainly with validation. We will cover acceptance test in one of the upcoming lectures. Also we have to perform verification. That means we have to make sure that we have built the product right. In verification, we have to check the quality of the implementation and make sure that all the functions works correctly and has no defects. Most of the testing is targeted at doing verification. That means try to find mistakes and errors within the software system. Our goal in testing is to design a set of tests that will systematically find defects from the software system. Notice that testing can not show the absence of software errors because there are too many possible conditions, too many possible inputs, and also situations that may happen within a software system. It's not possible to test a system exhaustively using testing. That's why here I said it is impossible to test a non-trivial system because there are too many possible things that you have to consider within a software system. Notice that testing is a destructive activity because we try to make the software fail and it's often difficult for software engineers to effectively test their own software system because usually people are blind to their own mistakes. Now we talk about how to plan tests. That means how we can come up with a set of test cases to test a software system. Some of you may ask, why don't I simply test the system exhaustively using all the possible inputs and see whether the system is going to work on all the possible inputs or not. Let's take a look at this example. For example, we have a procedure that takes the inputs x, y and z, and they are integers and they should be within this range from 1-10,000. Now considering all the possible combinations of this input, how many possible combinations do we have? 10,000^3. We have many possible combinations of the inputs. It's going to take a long, time before you can finish the testing. That's why it's impractical. That's why in testing instead of testing all the possible inputs, what we're going to do is that we try to find a small set of test cases that can actually uncover as many defects as possible in the shortest possible time. Just because we usually perform testing after implementation. Usually we have limited time after implementation because we have to catch up some deadline. That's why it's a good idea to try to come up with a small set of things, small set of test cases that we can use to uncover as many defects as possible in the shortest possible time. That's why our goal is to that we try to come up with tests that have the highest likelihood of finding defects with the minimum amount of time and effort. This is our target. Why? Because we have limited time for testing after implementation and before the deadline and test is going to cost money and also time to develop, perform and evaluate. That's why we want a small set of test that have the highest likelihood to find defects. In our test plan, we have to specify a testing strategy. That means what test to perform, how to perform them, what are the required test and code coverage percentage that should execute with a specific results. That means how many tasks that we have to pass before we can say that our system is stable and then we can release the system to our users. Then we also need a schedule for testing. That means when to run the test and also an estimate of resources required. For example, whether you're going to run all the tasks using human or a system. Again, the key problem here is that we tried to figure out a set of inputs that is small enough to finish quickly, but large enough to validate and verify the system. That's why instead of testing all the possible values, what we're going to do is I'm going to partition the possible values into different sets. Then each set is going to have similar behavior. Then we're going to pick one value from each set to perform testing. That's why we have to partition inputs into sets with similar behavior, and then we're going to pick one value from each set. But how to define similar behavior? There are different approaches that we can use. For example, whether we are talking about execution equivalents or whether we're trying to find similar subdomains. Notice that discovering the set of inputs require perfect knowledge. That means we need to see the source go in order to figure out the set of inputs that we're going to use for testing. We also need to use some heuristics to approximate the sets cheaply and quickly. If you partition all the possible inputs into different sets according to execution equivalents, then there are two sets that we can actually use. If the number is negative, then we are going to execute this line. If the number is non-negative, then we're going to execute another line. We can partition all the inputs into two different sets. One is negative and another set is non-negative. Negative will be one set and then bigger than or equals to zero will be another set. Then we have to pick one value from each set. Let's say we pick minus 3 from one set and then three from another set. Then we say that we have a set of values minus 3 and 3. That is going to be a good set of values that we can use for testing. Now, let's consider this buggy code. In this buggy code, instead of having x less than 0, now we have x less than minus 2. Now, if we are using this set of values that we have derived before, can we actually find the error? No. Because if you feed minus 3 into this function, you get the correct result and then if you feed three into this function, you also get the correct result. Now, you can find the era using this set of values minus 3 and 3. That's why using execution equivalence may not be always effective. Sometimes we may have to use another heuristic, for example, using sub-domains to find out a set of values that we're going to use for our testing. We have two sub-domains in this example. One sub-domain is the set of values that is going to give you correct result. For example, anything less than minus 2 or anything bigger than or equals to 0, you'd get correct results. But if you have, for example, minus 2 or minus 1, they would give you incorrect result and they are in other sub-domain that you can use. We're going to petition the set of values according to sub-domains. Whether you are going to get correct or incorrect results. That's why here we say that a program has the same behavior for two inputs if it gives a correct result on both or gives incorrect results on both. A sub-domain use a subset of the set of possible inputs. A sub-domain can be used to find for an error, for example, E, if each element in the sub-domain has to saying behavior. In our previous example any the value that we have in this incorrect sub-domain, for example, minus 1 or minus 2 can be used to find out the mistake that we have in this line of code. If the program has an era E, then it can be found by the task using this sub-domain. For this buggy absolute value function, what are the possible subdomains that we can use? There are two sub-domains. One is the set of values that we can use to uncovered this error. Then in other set of values that is going to be always giving you the correct answer. Considering this input sets, which one is going to be the best one for finding this era within the program. Again, we need something from the correct sub-domain and we need something from the incorrect sub-domain. I will say this set, minus 3 and minus 1 will be the best sad because minus 3 you get the correct result while minus 1, you can use it to uncovered this error. Why this set is not the best one? Because this larger one contains one input for the correct sub-domain, which is minus 3, and we have two values which is minus 2 and minus 1 that we can use to uncovered this error, and it's not minimal. That's why we prefer this one because it says minimum set. Again, always keep in mind that we want to find a small set of inputs that have a high chance of uncovering error. That's why we prefer this smaller set rather than this larger one. To petition to inputs and outputs into different sub-domains, we need all this information. For example, program dependent inflammation, that means we need the source code and also we need programmed independent information. For example, all the documentation, algorithm being used, input or output data structures, et cetera. Then a good heuristic should give you a few subdomains that have a high chance of finding errors. Notice that different heuristics, they target different classes or kinds of errors. In reality, usually we use combination of multiple heuristics.