So, we know that a user test involves getting representative users to perform a set of tasks with your system so that you can see how they interact with the system. But what is it that you're really looking for when you conduct a user test? Well, of course we want to know were people able to complete the tasks that we assigned to them. Did it take them an exceptionally long time to do it? Longer than we expected perhaps. Did people make errors? If they made errors, did they know that they made errors? Did they recover from those errors? How did people feel about the experience? But even more important than all of these questions, what we want to know when we conduct a user test is why? Why were people unable to complete the tasks if they were unable to do it? Why did they take a long time? Why did they make errors? Why did they not know it or why were they unable to recover? Why did they feel the way that they did about the experience? We also want to capture critical incidents which are things that happen during the user test that might explain some of the outcomes that we measure and we want to capture verbal accounts or statements that users make that would indicate thought processes, attitudes. It might help explain some of the things that we observe. These are qualitative data that we will capture along with the quantitative data throughout the user test. Does quantitative or qualitative data matter more? Well it depends on the type of test that you're conducting. In summative tests you're most interested in the quantitative data and you might not even capture the qualitative data. Informative tests which again is what we're focusing on in this course, you're actually more interested in the qualitative data. You're going to capture the quantitative data and use it to ground the important stuff but what you're really interested in is the data that tells you why things are happening not just the observation that they're happening. Critical incidents are the key type of data that you want to capture when conducting a user test. A critical incident is just anything interesting that happens while users perform tasks. That might help explain why they were or were not successful at completing those tasks. Examples might be clicking on the wrong button or ignoring the instructions that were shown on the screen, entering the wrong information into a field, following the wrong path, maybe going down the wrong path several steps before realizing the issue, misinterpreting a label, expressing confusion or frustration at a particular point or asking for help or staring at the screen for a long time before making a decision or of course giving up on a task completely. There's lots of other incidents that might be of interest and might help explain what happened during a test. Verbal accounts are just the things that users say when performing the tasks and they can help give insight into users mental states as they're going through the test. So, they might say things like I'm looking for such and such which will give you a sense of what they're expecting to find on a page or given some feedback from the system, they might say I was expecting to see something different from what I actually saw. They might say something like "I wonder what this does" which indicates that they're trying to do some problem solving and trying to make sense of a particular part of the system. Or they might say, "Well that doesn't seem quite right" or "I think that was right" which would indicate that they're not really sure about the right way to do things. They might ask a question either to themselves just thinking out loud or they might ask it to you asking how they do something which would clearly indicate that they're unsure about how to progress at a particular point. These types of qualitative data are essential for helping us go from just the data of the test to actionable intelligence or information that we can act on in improving the design. So, the data from a user test might be something like five out of seven users failed task three. But what we really want to know is why did they fail task three? So, better information would be, three users could not figure out how to enter shipping information in a hypothetical task and two users entered shipping information but then canceled because it seemed wrong. This might actually point to two different problems with the user interface. An even better insight would be that the shipping information dialog does not appear until after clicking enter payment info and this doesn't match users mental models. After entering shipping information, the final checkout screen does not show users selection and the lack of feedback leads them to abandon the task. So, for example, we might have observed the failure to enter shipping information as being something that led to a task failure. The cancellation of the task would be a critical incident which would indicate a point of task failure. Another critical incident might have been a long-time spent inspecting checkout pages which would indicate that users were struggling to make sense of what was happening and what they needed to do next and the verbal accounts would play an important role as well. Users saying things like "I don't see anywhere to enter shipping information" for example or "I would expect to find feedback about what it is that I entered previously" or "I don't see my selection here. " These would all indicate the places where the system wasn't matching users mental models of what was supposed to be happening. In addition to data collected during the process of the test itself, it's often important to collect data about your users that will help you understand certain aspects of what it is that you observe. So you may want to collect technical expertise, how familiar are your users with computing in general or with a particular platform that you are testing on whether it's web or mobile or desktop? You may want to find out what their domain expertise is. If this is a shopping site, how frequently do they shop? How sophisticated are they at shopping? You often want to capture the frequency of relevant behaviors, the types of things that they would be doing on your system. So, for example, how frequently do they do the kind of shopping that you're testing? In some cases, you may be interested in general demographics such as age, gender, education level, country of origin but you want to make sure that when you're asking these questions you have a reason for collecting it, you're not just kind of fishing for possible connections. Most often, expertise and relevant behaviors are the most predictive of differences in user test performance and these other things are proxies that might help you get at expertise and behaviors. The reason that we want to collect this kind of user data is not just because we're interested in it, but because we want to explain different outcomes and performance and attitudes. So, we might, for example, observe that three out of four users with low technical proficiency failed a certain task and all three users with high technical proficiency succeeded. This might tell us that our system is good at serving the needs of people who are fairly sophisticated technically but not so good at serving people with lower technical proficiency or we might have an observation like well all users succeeded in most tasks, users with higher domain expertise gave a lower perceived usefulness ratings. Meaning that people who have a high degree of expertise or were very familiar with the domain may not find the system as useful as people who are more novice. So, to sum up what we're looking for when conducting a user test, we want to capture quantitative data to ground what happened during the test so that we can explain what happened in a way that leads to improvements to the system. So, we want to capture performance, how well does the system do, what it needs to do and we want to capture subjective measures, what are users perceptions of the system in terms of its usability, usefulness and other aspects of desirability. But when conducting a formative test which is what we're going to be doing most of the time, it's most important to collect qualitative data to explain why we observed what we did because knowing why tells you what you need to do to make it better. We often want to capture data about our users to explain differences in the quantitative and qualitative data. In upcoming lectures, we'll talk in more detail about how to capture these different types of data.