[INAUDIBLE] Okay. So, the biggest problem is of course that the magnitude of differences is discarded. So, it's, it's potentially not as powerful as you'd hope, right? You know, it would be different if all, you maybe be feeling you got half of them as being positive, but all the ones that were positive were much larger differences, and all the ones that were negative were really small differences than that would be different than if they were kind of spread equally between, above and below zero. so I would say that, but then the other thing that I would mention is there's nothing specific about zero, you could have tested any means, theta equals to theta naught. By calculating the number of times the difference is bigger than any specific value. Testing whether the median is bigger than any specific value. What's interesting about that, then, you know, we won't talk about this, but it is kind of interesting, right, that you can do this for any value of theta. So that means, you can find the values of theta for which you fail to reject and the values of theta for which you reject. And then of course, if you can do that, you know, by grid searcher, say, something like that, then you can invert that and get a confidence interval for, for the median. And so this is kind of an interesting way of, highly non-parametric way, to get a confidence interval for the median of a set of data. So Wilcoxan thought about this problem of discarding the magnitude of the differences. And, he said, well why don't we, instead of using just the signs, why don't we also use the ranks of the differences? So this, using the ranks saves some of the information regarding the magnitude of the differences. You're still testing whether the median is zero versus the three potential alternatives, and we can, we can create a statistic, and I'll say that appropriately normalized, this statistics follows exactly a normal distribution. But if there's no ties, then the exact small sample distribution is known, and then why use the normal distribution? The one benefit of the normal distribution in this case is because the small sample distribution is known. we can evaluate how accurate the normal distribution is, in general. so it's different than a standard normal distribution, where you don't know the exact small sample distri-, the associated small sample distribution. So you can't tell how well the normal approximation is working. So I would say at least that one aspect of normal approximations and non-parametric tests is a little bit different than normal approximations in general, because we can evaluate how, we can get a much better sense of how well the normal approximation is working. So here's the signed rank procedure. So we take all the pair differences, just like we did before, we take the absolute values of the differences, which is new, now, we haven't done that. And then we rank these absolute values from least to greatest, throwing out the ties. throwing out the zeroes is what I mean. and if there's ties, right, there's a difference between ranks of zeroes and ties. If there's ties, in other words, two, differences are identical but non-zero, then we use the average rank, we assign the average rank to those. and then after we've done the ranking, we multiply them by the sign of the difference plus one for positive difference, and minus one for a negative difference. And then we calculate the sum of the positive ranks, and that’s the sign rank test. Now, so think about what this is doing. Let's suppose we did have the case where all the big measurements were positive, and all the smaller measurements were, were negative. Smaller differences were, were negative. Smaller in absolute value differences. Then what you would see is that all the, the, the, the small ranks would have a negative sign, and all the large ranks would have a positive sign. So the sign ranks statistic would be very different from what you would expect by chance where the signs are distributed equally among the ranks. versus the sign test by itself, which if roughly equal numbers were above and below zero. Then you would get the same, then you could get a statistic that was you know highly looked a lot like it, it was from the null. So just to reiterate this point, if, if the median is large, then W plus should be large. If the median is small, then W plus should be small. and this W plus, it does follow a, a normal distribution for large samples. for small sample sizes, and large sample sizes but, you know, especially for small sample sizes, we can work with an exact distribution under the null hypothesis and we can you can get critical values from, from a table, but you can get the critical values you know are pretty easily. So you know it's maybe a bit involved to exactly work with the null distribution. We'll do some of it, of the exact distribution of the statistic. We'll do some of it. But I wanted to at least, talk about Monte Carlo, so you can at least see that, yeah, you know, okay so we can pretty easily, figure out what the exact distribution is, provided there is no ties. so, what you could do. So if, if this procedure is invariant to the distribution being used. You have to simulate, so what you could do is, simulate n observations from any distribution that has theta as its zero, theta as its, theta is zero, its median is zero. and then rank the absolute values of the data, retain the signs, calculate the sign rank statistic. And apply this procedure over and over again. And you'll have just used Monte Carlo to get the exact distribution. especially because we're assuming this distribution is invariant to The, or, we could use any distribution, like for example the normal distribution, the standard normal, and that would that would give you exact small sample distribution, but we can actually go further than that. under the, under this distribution under the null hypothesis, right, the signs are equally likely to be distributed anywhere among the ranks. So all you have to do is actually take the ranks of the numbers between one and n and just randomly allocate the signs. Okay, and that's exactly the, flip a coin. For each rank value. So take the rank one, flip a coin, with success probability 0.5. Take the rank two, flip a coin, and, and so on, and then you've just generated this, this distribution exactly, rather than having to mess around with the normal distribution or something like that. But you know, I, I present it this way just to show you conceptually that, well, if this distribution or if this test really doesn't depend on the distribution. Then you could do, pick any distribution, and just simulate under a null distribution. But I'm contending that it's even easier than that. All you have to do is take the ranks, i.e., the integers from one to n, and flip a coin for each rank, and then you'll have exactly, giving a plus one and a minus one for, for head or a tail for each rank, and then you will have exactly calculated the null distribution in the signed rank statistics. So, it works out very conveniently. Okay, so I, I, I in fact actually say this explicitly on this slide here. Where, here's a little bit more elegant way to do it. Take the ranks one to n, randomly assign the signs as binary with a probability of 0.5 of being positive and 0.5 of being negative. And then calculate the signed rank statistic. any rate, if you, you wanted a Monte Carlo p-value, you would just apply this procedure. simulate over and over again. And calculate the percentage of times your test statistic was more extreme. And then, that would be the probability of obtaining a test statistic as or more extreme. the average of the number of instances where your test statistic was, your simulated test statistic was more extreme than your observed test statistic, would be a Monte Carlo approximation to your p value. And then let me just go through the large sample distribution. I'm not going to derive it. Am I going to derive it? No. It's actually pretty easy to derive, to be honest. If you think about it, right, it's really a binary times one to n, so it's really pretty easy to work with, but let's, so, so maybe the homework assignment, derive the, derive these things that I've put right here. you need, you need some some finite sum results for the sum of the integers from one to the n, and sum of the integers squared, but other than that, it's a pretty easy problem. but let's not waste time with that, because I don't see the true benefit of using the large sample approximation when you can use the small sample one anyway. unless there's ties or something like that. At any rate, the expected value of our statistic is n times n plus 1 over 4, under the null hypothesis the variance is this guy right here. And so our test statistic is just W plus, minus it's expected value divided by its standard error. That limits to a normal zero, one. You can, you can do a you know to, you can do the correction for ties if you want. But, under the null hypothesis, we can still do an exact small sample test, so why not just do that. Okay, so here, let's go through our example. Here we take our differences. we can do Wilcox.test, diff, exact equals false, and then it's going to actually go through, I go through the calculations right here. It'll agree with the results of Wilcox.test, so I think. Now that I'm looking at this, I'm not sure, you may have to put correct equals FALSE or something, to make sure you're not correcting for ties. At any rate, double check to make sure that they're, they're the same. They should be either exactly the same, or very close. And the, the, the, the correction for ties is the distinction. Which we don't really cover the, the little technical details like that in this class. at any rate, you get a, an expected value of W plus as 150, the variance is 1,225. So our test statistic is our observed W plus, 194.5, minus its expected value of 150, divided by the standard error. Works out to be 1.27. Of course that's not significant on the z scale. And, and so our P-value in fact worked, works out to be 0.21. So what is this suggesting? This is suggesting that we can't rule out that the median is in fact zero.