Here we are in our studio, and we're analyzing our study of human performance in a smart phone contacts manager, where people are finding contacts with search and scrolling initially, and then we'll consider voice later too. So this is our first within subjects analysis. And that's going to be a paired samples T test. Recall before when we did T tests they were independent samples T test. So you can think of this as a within subjects T test. We're going to load the data file search scroll and we'll view that as is our usual custom. We can see that we have columns for subject, the technique, which is search and scroll. We have 20 subjects, and order column, orders one and two based on which one was shown first and second. So we can test for an order effect of technique and make sure our counter-balancing worked. Incidentally in this study we fully counter balanced things because it's only an order variable of only two levels. We have measures of time in seconds to find a contact, errors and effort as a rating on a type scale at one to seven rating. So we'll be able to cover how those are done. We will turn subject into a nominal factor and order also. And then we can summarize over our table here. And as is our usual practice, we can view descriptive statistics by the variable of interest by the factor of technique. We can see some court tiles and mean and mediums there and the mean and standard deviation there. So, we can see already that search seems to be 23 seconds on, I'm sorry, 96 seconds on average to find a contact and scroll 137. It would seem there's a difference, but we can't just compare means of course. So, we look at the standard deviations around those means, and we see 35 and 23. So, are they sufficiently different or not will be what we answer with our statistical test. So let's explore the time response with histograms. Sometimes as we note longer task times measured in minutes or days for long tasks can be logged normal. That looks relatively normal. That looks a little off but similar too. The box plots help us see, actually, it looks like quite a significant difference probably between the two means of the two techniques. Let's do the Shapiro test on for normality on the search data. That seems to not deviate from a normal distribution and let's do the same. We have a 0.09 for the technique of scroll so it's close, but also seems reasonably normal. Of course we want to fit a model to the residuals for the testing of the normality assumption as well. And here it warrants a little note about how our syntax works. So we're fitting in analysis of variance model as we've done before. And we have the technique factor which is our one within subject's factor, but we have to tell R that there are repeated measures on the same subject. And we do that with a function called error that you add in as a factor to the equation, to the model. And we put our within subject's factor subject here. We put our within subject's factor technique here, and we say subject is what correlates across those rows in the table that were within subjects with respect to technique. I put the general comment here. Generally if you have the error function with s and then a interacting with b and c, it would mean s was exposed to every level of a, b, c and s is the column encoding the subject IDs. So you can see how R is interpreting that there. Okay, so then we get residuals for subject. And we can check those. And we can see that they look normally pretty normal around the QQ plot. We also get residuals for the subject technique term. Because of it being within subject's factor. There's a little departure in the corner, but for the most part, those look reasonable too. And both Shapiro will tell us that our residuals seem normal. Okay, we can also check with Levine's test, the and we can see that we should be able to compare these [COUGH] conditions. Because we don't have a significant result. Now let's look and see if there's an order effect, to make sure that our counterbalancing worked. And for that, we're going to have to use a wide format table. And you'll remember, a wide format table includes all the measures for a subject in a given row. We can do that with the reshape library. Our counterbalancing for two techniques was fully counterbalanced. Because with two techniques, there's just two sequences, so that's easy to cover. I have a note here explaining a wide format table. But you know what that is now, and let's make a new table called search scroll wide and order so we know it's the order effect. We can use the Dcast command to split the table and we want to do that using subject to group each row and give us the order variable encoded in new columns. We'll go ahead and execute that, and we can view that table to verify that it looks right. So we have subject, and then we have order one and two, which looks correct. So each subject experience both orders, and so that's why that makes sense there. Then we could do a T test comparing order one to order two and notice here, for the first time we say paired equals true. It's a paired T test, that's what makes it a paired samples T test, unlike before. We go ahead and run that T test, and we see that the P value is .19 which is good it suggests that we don't have an order effect where order itself is causing difference in performance. You know how to report a T test so you could report data if someone wanted to write that down. So then we can do the paired samples T test itself on technique, which is what we actually care about. We have to make a wide table for that as well, so we'll make one srchscrl.wide.tech for technique. So we make our new table. We view that table just like we did. And we can see the column names are now Subject, Scroll and Search. Which are, is the comparison we care about. And, then, on the next line, we do a paired T test between search and scroll in that new table, that new little wide table. And, here we see a P value, of .0017. So, it does look less than .05. We have our T statistic here, and our degrees of freedom. And so we have a significant difference and we can kind of visually confirm that for convenience by looking again at the box plot between scroll and search. So it seems that scroll the time scroll to it took is in seconds a fair a bit longer. Than the time searched to find contacts on a smart phone. Maybe we would expect that because scrolling can be a little bit more tedious as far as the human action goes. As we've done before, we can consider a non-parametric equivalent of this test. A non-parametric equivalent of a T test is the Wilcoxon sign-rank test. We're going to do that not on time, we could do it on time but we've gotten our result for time and the assumptions looked met so we're comfortable with that. Let's do it on errors. Errors are an account response and often errors don't satisfy the assumptions of normality for anovas. Errors can be kind of randomly distributed. They often can be Poisson distributed. They're clipped at zero. You don't get nice calcium curves usually when you're measuring errors. So let's explore the errors data that we've got in a different column in the original table. And we'll look at some histograms. This is the errors for search, just to kind of get a feel for their spread, and the errors for scroll, clearly not normally distributed there. And we can do a box plot of errors as well. Scrolling was slower than search, but it also looks like scrolling produced fewer errors than search. That might make sense especially depending how we counted errors. If we count text entry errors as part of our error concept, not just selection errors. Then maybe search could have more errors. Even if we don't we might search and get a similar name but not quite the right one or something like that. Scrolling might show us more context around the content. So that makes some sense. Here I say we might want to again test for the ANOVA assumptions of abnormality and homoscedasticity but we've covered that, and we know how to do that so I'm going to omit those from now on until we have something new to be learned from that process. Let's try though to fit a Poisson distribution, because we know this is count data and with count data we often have that kind of distribution. Poisson distributions are discrete, not continuous. And so the fitting technique will use a library that allows for that. So we'll first fit two errors. The Poisson distribution, that's what the P-O-I-S String means. And then we do this goodness of fit test function, which tells us, in the chi squared form, a p value that says, we're not significantly departing from a Poisson distribution. We can do the same for the scroll data, and see. Again. The chi squared P value is not significantly departing from a Poisson distribution. So we may, in fact, in the future, use that information to do an analysis specific to a Poisson distribution. We'll return to that later. For now we'll just do the Wilcoxon Signed-Rank Test on errors, it's the equivalent of the paired samples T-test. So we load the coin library, and we use this Wilcox sign underscore test function. One thing to note about it, and I make a note here for you, is that we are measuring errors by technique, but we include subject behind this vertical bar, which tells this particular function that it's the blocking factor to correlate across rows in our table. We're not using a wide format table here. Notice we're using the original search scroll table. For two levels, we can do an exact test to get an exact P value, and we can see that it's much less than .05. We're given a Z statistic result as the output. Now, that tells us based on the box plot, that scrolling in fact was significantly less error prone than searching. We can also examine effort using a nonparametric statistic because effort's on a Leichhardt scale, it's an ordinal response. And Leichhardt scale responses almost never satisfy the conditions for anova either. They're bound from one to seven, they're ordinal so the difference between one to two while numerically might be one maybe conceptually or perceptually is not the same as the difference between four and five or five and six. For example. So ordinal variables really should not be analyzed with parametric analyses. We'll talk later about how to analyze parametric or ordinal variables, ordinal outcomes later on in the course. But for now, we're perfectly happy to use a non-parametric Wilcoxon signed rank test. So let's go ahead and first explore briefly the data in the effort Leichhardt response so you can see here that scrolling had a mean of about 4.4 and search was 3.6. That's where presumably higher on the scale is more effort. Perhaps lower on the scale is less effort. We can also look at histograms for the one to seven scale which is on the x axis here. That's for search and here is for scroll. We can see maybe some people thought that was more effort. And our usual box plot where we can see the means look pretty similar but there's more difference of opinion around the scrolling amount of effort. And we run a Wilcoxon signed-rank test again with the syntax we used before, but now looking at effort we can see the P value is 0.08. So, it's not significant at the 0.05 level. Meaning that, although, there may be some differences here we could not detect them. There are no detectable differences between the effort required for scrolling and searching. Okay, so now let's go to our analysis table, and take a look at where this has brought us. So we've just finished analyzing our searching and scrolling data for finding contacts in a smart phone contacts manager. And we had a within subjects variable for the first time which was the technique we were using to find contacts. So we had one factor technique with two levels, search and scroll. It was within subjects factor and we can see in red that the parametric tests we did was a tee test. That was our form of analysis of variance, and it was a paired samples T test. The non-parametric test was a Wilcoxon signed-rank test, which avoids those assumptions of the parametric approach, but is somewhat less powerful to detect differences. What happens if we have a third level of our within subjects factor? Well we can't use a paired-samples T test or a Wilcoxon signed-rank test anymore. That brings us to the next row in the table. The one-way repeated measures anova and the Friedman test. Let's go back to our R terminal and look at how we'd carry out those analyses.