Zero. Okay, so that's the nonparametric equivalent to the paired test. Let's talk about the nonparametric equivalent to the unpaired test. And here we have, we're comparing two measurement, measurement techniques, again, from this wonderful book from Rice. Mathematical statistics and data analysis. at any rate. They, they were comparing two measuring techniques. And the units are in degrees Celsius per gram. and here we have a group measured with method A and a group measured with method B. And we want to kind, kind of test, are the, the measurements the same? And we'll, we'll be a little more formal about the hypothesis in a minute. but so let's, let's talk about how we can do that. And so what, basically the method we're going to use is, the, not to be confused with method A, method B. The technique we're going to use for testing whether the two methods are the same, is to kind of take the AB labels and, and shuffle them on every measurement. But because, to be nonparametric, we're going to shuffle them on the ranks. But then we'll talk later on about shuffling them on the observed values themselves. That's the so-called permutation test. Okay, so what we're going to do is test whether or not the two measurements two treatments have the same location. and what I mean is, kind of, the distributions are centered at the same place. we're going to assume that the measurements are independently, independent and identically distributed. have an independent identically distributed errors, that are not necessarily normal. So, there's a difference between the errors being normally distributed, versus the measurements being normally distributed. And that's one way to, so this is the problem with this test is, that's one way to write out the assumptions. Another way is to view this as a test of kind of a distributional shift, that, the distribution for method B is kind of uniformly shifted relative to that of, of, of distribution A. And that's called a stochastic shift for the two arbitrary distributions. So, you can either kind of specify the hypothesis kind of tightly. That they're centered in the same location with IID errors, and then you get the same test statistic and it has a set of power for that particular collection of hypothesis. Versus a very general one about a stochastic shift, and it has a different kind of power for that set of hypothesis. so all we're going to use, use is we're going to disregard labels, method A, method B labels. We're going to rank the observations, and then we're going to use the sum of the ranks by discarding the within each treatment label. And this is called the Wil-, Wilcoxon rank sum test. It's equivalent to the so-called Mann-Whitney test as, as well. So, so you might call it, I don't know, the Wilcoxon, Mann Whitney test. In R, it's wilcox.test. And, and I should say that there, there are some slight differences between the tests, depending on how you, the tests work out to be the same, but they characterize the test statistic in slightly different ways. But it's, it's still, I think, correct to attribute the test to Wilcoxon and Mann Whitney. Mann Whitney being two researchers. so the procedure is to discard the treatment labels. method A, method B in this case. Rank the observations, without concern over which treatment they were. calculate the sum of the ranks in the first treatment, which is arbitrary. You could pick either the first or the second treatment, you get the same value, but you have to pick one of the two. And then you either compare your statistic with the asymptotic normal distribution of the statistic, or you can, you can calculate the exact distribution under the null hypothesis. So here I show the ranks for method A, the ranks for method B, in case the two observations are tied, we give them the average rank and then move on. the sum of the ranks for method A was 180, and the sum of the ranks for method B was 151. By the way, the sum has to add up to 231, by the way. and let's, just because that's a fun result, let's let's show why this is the case. So, Gauss supposedly did this as a child. there's some hypothesis that this story's apocryphal, but whatever. Let's, for, for our purposes, let's assume he did it when he was a kid. So, the story goes, is that his teacher asked him to add up the numbers between 1 and 100. And he went down and sat at his desk, and just came back with the answer. And the teacher said, that's not possible, how did you do that? And then he went and really did it and got the same answer. Any rate I think the story's probably apocryphal, but here it's really a neat way to show it. Is we could write x is the sum of the digits from 1 to n, 1 plus 2 plus 3 and, in that way. Or we could write it as n, plus n minus 1, plus n minus 2. It's the same exact thing all the way down to 1. So, if you add the two together, you get 2x equals, and in this case, notice 1 plus n is n plus 1, 2 plus n minus 1 is n plus 1, 3 plus n minus 2 is n plus 1, and so on. And so this, so 2x is n times n plus 1, which is exactly what's happening here as the number n plus 1 added up n times, so it's n times n plus 1, so then x has to be n times n plus 1 over 2. Okay, so, let's let W be the sum of the ranks for the first treatment. And then, you know, if, if a treatment has more numbers in it then it's, under the null hypothesis it's going to have a higher sum just by virtue of having more numbers, so we need to know NA and NB. the number in each sample and it turns out that the expected value of the sum of the ranks under the null hypothesis from the first group works out to be this guy. Na times nA plus nB plus 1 divided by 2 and so one with a standard error given by this guy. And then we could create a test statistic which is our W, our sum of our ranks in our first group minus it's expected value divided by its standard error. Turns out to be normal 01, of course you can calculate the exact distribution as we described before. Okay, so let's go through our example, in this case our, our sum of our ranks was 51, if we did method B, our sum of our ranks was 51. Here's our expected value and standard, standard deviation of that statistic, 88 and 13. Our test statistic works out to be negative 2.68. P value of 0.007 for two sided. and then, you can also do the function wilcox.test, and it'll perform the, it'll perform the test to you. Both for the one and two sample version, you have to read over the documentation for wilcox.test. If you give it one vector, it's going to do the sign rank test You give it two vectors, it's going to do the rank sum test. So, some final notes about the nonparametric test is, they tend to be more robust outliers than their parametric counterparts. They do not require normality assumptions. They often have exact small sample versions. And their trick, their big trick is to focus on the ranks rather than the raw data. you, there is some loss in power of their parametric counter parts. assuming the parametric assumptions are met, but the loss in power is often not so bad. and then I just want to emphasize, nonparametric tests are not assumption free. They can, they're often distribution free. for example, the sign rank test, you really kind of have to assume that the distribution is symmetric. but either way, in all of the tests that we considered, you have to have a sampling model, that the data's IID, right? That, that's an assumption, a big assumption. The biggest assumption. so, it's, you know, so just to emphasize that nonparametric tests are, are not assumption free. They're they're, they're often distribution free, but not assumption free. And then, so I just wanted to remind people about permutation tests, because it's, we've already talked about them a little bit sort of in, with regard to, to Fisher's exact test. But here we could also talk about them in general. so permutation tests are similar to these rank sum tests, though they use the data rather than the actual ranks. So, under the null hypothesis for the rank sum test, we had the collection of ranks. And, and our null distribution was just obtained by permuting the treatment labels, you know, we had NA treatment A labels and NB treatment B labels, and we just permute those with respect to the ranks. That would retain NA labels and NB labels, but they would be randomly allocated among the ranks. A permutation test is exactly the same thing. You're just doing it to the raw data rather than the actual ranks, and you have to come up with the statistics. and I go through the procedure here. you know, you could permute the ranks and then create a rank statistic. I, I, I also want to well, I, I want to distinguish, there's two ways to think about this. One is, imagine if your treatment was actually randomized. Then you can think of the permutation test as actually kind of redoing the randomization. And in that case, the permutation test is called a randomization test if, if you interpret it that way. but you can also kind of perform the permutation test even if the treatment wasn't randomized. Because you're thinking along the lines of well, my null hypothesis is that my labels A and B are exchangeable between the groups. so either way it kind of make sense. but it, it changes your interpretation a little bit of the, of the test in either way. But at any rate, the Fisher's exact test, the rank sum test, Fisher's exact test, which works on collections of binary data, Fish-, the rank sum test, which works on ranking the observations. And then the permutation tests all have the same basic principle. Is that, under the null hypothesis, however we're interpreting it, is that treatment labels are the NA and NB treatment labels, are exchangeable. And our null distribution is obtained by permuting those labels across the values in, in Fisher's exact test, and the permutation test, we're permuting them across the actual observed values. And then in the rank sum test, we're converting the data to ranks first, and then permuting across the ranks. So just to reiterate, this is an easy way to produce a null distribution for test of equal distribution. It has kind of a similar flavor to the bootstrap, maybe not exactly. this, this produces an exact test. it's less robust but more powerful than, than, than rank sum tests, because you're not throwing away the data. With rank sum tests, you throw away the actual units and you go to with ranks so you gain robustness at the expense of power. This, you get a little bit more power under certain assumptions. but you lose some of that robustness. It, it's very popular in, in large scale, big data applications, like genomics, for example, and neuro imaging. So this final picture is just what you would aspire to get from a permutation test. You would permute method A, method B labels. You would say, calculate the T statistic as if the permuted labels were the observed labels. And you'd do that over and over again. And you get a null distribution of T statistics, right? And then you, this vertical line is where our absolute, our, our actual T statistic occurred. And then, if that's the case, then you or whatever statistic, it doesn't have to be a T statistic, but that's a reasonable statistic to do. And then the percentage of the simulated statistics that are more extreme than our observed statistic is our exact P value. So that's a, that's a permutation test. if you were to do it with the ranks, then this would just be simulating the, the, the exact small sample distribution of the rank sum statistic. If you're do it with a raw data, then it's a so-called permutation test, and so on.