There are several different types of user tests that you might conduct depending on what questions you're trying to answer. So, you might be trying to answer the question, is one design that we've come up with better than another design? Do people perform better using one design versus the other? Do they do things faster? Do they make fewer errors? You might be interested in knowing, is our product that we've designed better than some product that our competitors have designed? You might be interested in some cases in knowing whether a design that you've come up with meets a specific benchmark. So, are people able to complete tasks in a certain period of time? Are you able to hit a certain task completion rate, say 95 percent of all users are able to complete a certain task? Or are you able to get the number of errors down below a certain threshold? Or you might just be interested in knowing, what are the biggest problems with our design that we should fix next, and this is one that you might do while you're in the process of still designing the product. We can categorize user tests into two broad categories, summative and formative. Summative tests are where you're trying to sum up a design in terms of its qualities and make some claims about the quality of the design. Examples of this would include, is design A better than design B? Is our product better than our competitors? Or does our design meet a specific benchmark? Formative tests on the other hand are tests that are used as part of the design process where we're answering questions like what are the biggest problems with our design that we should fix next? We're going to go into each of these different types of tests in a little bit more detail. Summative tests come in a couple of different varieties. The first being comparative test. This is where we're trying to compare a design that we've come up with against another design, whether it's the old version of our system, two alternative designs that we've come up with as part of the design process, or comparing against a competitor's product. So, this answers the questions, is design A better than design B? Is our product better than our competitors? Is the new version better than the old one, and so forth. The common goal across all of these different questions is to show that design A or some design is better than design B in some measurable outcome. So, we might want to make a claim like, 30 percent more users completed tasks using design A than they did with design B. Or maybe, we want to say something like, users completed tasks 40 percent faster using A than they did with B. Or errors were reduced by 25 percent with the new version as compared with the old version. The general idea with a comparative test is you have one group of users using design A, another group uses design B, you measure their performance across both conditions, and you make a comparison. To do this the right way requires a controlled experiment. You might remember from high school science what a controlled experiment looks like. Well, you have to have a hypothesis. Your hypothesis would probably be that design A is better than design B in some measurement, whether it's task completion, performance time, et cetera. You need to have a control, which in this case is going to be B, with A being the experimental condition. Running an experiment like this requires careful design. Condition A and B should vary minimally. They should vary only in the details of what exactly it is that you're trying to test. So, at a minimum, the data and tasks and users should be the same, or the user should be the same type of users, and only the design should differ. So, nothing else should differ. The environment in which the test is held should not be different, and so on and so forth. You need to use statistical methods to show that the measurements are different. These would be statistical tests like a t-test, a chi-squared test, and ANOVA, and other types of things like that. Tests like this require a relatively large number of users to show a significant difference between condition A and condition B. To do this kind of thing requires mastery of statistical methods in order to design these types of experiments responsibly. Another type of summative tests that you'll see sometimes is a benchmark test. The question that's answered by a benchmark test is, does our design meet some specific performance requirements? So, the goal is to support a defined level of performance, and you might want to make claims like, users can accomplish task X in less than 30 seconds. Or 95 percent of users succeeded in accomplishing task Y. Or users make errors less than 10 percent of the time. This type of test is often used when there are hard task constraints, like a particular task has to be performed in a certain amount of time, or there are defined targets for legal or other reasons such as our system has to have an error rate less than one percent, or something along those lines. So, these types of tests are most often seen in performance critical domains like healthcare and the military. The basic design of a benchmark test is you have one group of users perform tasks using the design, you measure their performance, again, task completion, timing, and so forth. Then, you demonstrate that the performance meets some criteria. Well, this isn't a controlled experiment because you're only testing one condition. It does require a carefully constructed study. You don't have a hypothesis, you don't have a control condition, but you do need careful design to reduce the confounding variables, all of the other things that could possibly impact the outcome that you're trying to measure. You do again need to use statistical methods to calculate, in this case, the confidence interval. You do need to have a relatively large number of users in order to establish that confidence interval. So, to summarize, these two types of summative tests, comparative and benchmark tests that we just looked at, you want to use a summative test when you need to show that a design is better or good enough. You can think of summative tests as summarizing a design in terms of some characteristic that you want to demonstrate and make a claim about. Summative tests are statistically driven, which means that they require statistical training to do properly, training that is beyond the scope of this course. It also means that the number of participants that you need for a test like this is driven by the statistical methods that you're using. So, typically, summative tests will require more participants, 10 to 20 is typical, and sometimes can be a lot more, depends on various statistical factors, compared to formative test which typically use five to seven users, which we'll talk about the reasons for a little bit later in the course. Summative tests are fairly rare in UX. For a variety of reasons including the need for statistical background and the increased number of users, summative tests are fairly rare in UX research. One exception that we'll look at in other courses in the UX Research and Design Micro Masters would include online A/B testing, which is also called website optimization, that's covered in the UX@Scale course. But summative tests like comparative and benchmark tests, they are done occasionally but they're not nearly as common as formative tests, which we'll look at next. Formative tests are performed when the goal is to identify problems that need to be fixed. So, in contrast to summative tests where we want to have an outcome like 40 percent of our users were able to accomplish tasks in under 30 seconds or something along those lines, an outcome of a formative test might be, users struggle to complete task X because the button labeled 'Save to profile' was confusing, or users took too long to complete task Y because the relevant information was at the bottom of a long page and it took a long time to find it. Formative tests are by far the most common form of user tests that you will come across in UX research and design. Formative tests are generally performed during the design process to find bugs to fix in the next iteration of the design. The general form of a formative test is that you'll have representative users perform a set of tasks that you define, you'll watch what they do and hear what they say, you'll see where they struggle, and you'll identify parts of the design that cause problems. To compare summative and formative tests, both summative and formative have users perform tasks, and those users need to be representative of the users that will ultimately be using the system. They differ in the sense that summative tests are used to prove a point, to drive home that our system can perform to a certain level. The goal of a formative test is to find problems. Summative tests use quantitative or statistical methods, whereas formative tests use qualitative or interpretive methods, where it's up to the analyst or the person administering the test to interpret the results and report them. Summative tests require relatively more users. Formative tests can use fewer users. Summative tests, as I said, are fairly rare. Formative tests are very common in UX research. For these reasons, summative tests are not going to be discussed too much more in the rest of this course. Formative tests on the other hand are going to be the focus of this course. We'll be focusing on how to do a good formative tests that informs the next cycle of design. To sum up, it's important to know that there's more than one kind of user test. Sometimes, an audience that you're communicating with about the results of a test that you perform will be expecting something different, and you need to explain to them what the goals of your test are and why you chose the method that you did. So, it's important to know that summative tests exist even if you won't normally be performing them. You need to know how they're different and why you would choose one type of test over another. Formative testing is what you'll be learning in the rest of this course.