The P-value is the most widely used statistic in the entire world including for inference and for everything else. Its so popular that if it was cited every time that it was used it would have at least three million citations, making it the most highly cited paper ever created. So the p-value is a very important statistic and since its such an important statistic there are lots of people that hate the p-value because it's so popular. And so part of the reason why people hate it, is because people consistently miss interpret the p-value. And so the p-value is defined as the probability of observing a statistic that you've calculated. That is extreme as you observed it, if the null hypotheses is true. So a couple of the things that p-value is not and that will make statisticians see red is if you say that the p-values the probability that the null hypothesis is true it's not equal to that. It's also not the probability that the alternative is true. And in some sense it's not necessarily a measure of statistical evidence. That's a philosophical term that people will worry about but in this case, you need to interpret it very narrowly. As the probability of observing a statistic as or more extreme than the one you observed in the data if you would observe the null hypothesis to be true. So here we're going to use that example again with the responders and the not responders to illustrate what's going on. So again, we have responders and not responders, now we're looking at say, for gene one, a statistic that compares the responders to the not responders. So we might calculate the T statistics to take the average expression level among the responders, and subtract the average expression level among the non-responders. And then standardize that by some measure of the variability, in this case, the average variability in each of the two groups. So in a previous lecture we learned that one way that you could try to quantify a null hypothesis. The null hypothesis that the distributions are exactly the same among the responders and the non responders, is to permute the sample labels. So when you permute the sample labels, you leave the relationship among the genes unchanged, but you can look at the, you can break the relationship between each gene and the responder non-responder label. So if I recompute the statistic, after I do that, I get a distribution under the permutations and then I have the original statistic that I calculated. And so the p-value that I can calculate could be the number of permutation statistics I observed to be larger than the statistic that I originally calculated. And I do that in absolute value since in general the null hypothesis is that the value is equal to zero. That there's no difference between the two groups. But the alternative could be that it's either more or it's positive or it's negative. And so I have to look in both directions, whether it's positive or negative. And so I just count up the number of statistics that are more extreme in each direction, and I divide by the total number of permutations. So I basically average the number of times I observed the statistic as or more extreme under this null hypothesis as the statistic I originally calculated and that gives me the p-value. So this p-value is often used as a measure, but in general it's basically used as a hypothesis testing tool to be able to say, if that p-value is small, you're going to reject the null hypothesis. Because the statistic is very extreme compared to the distribution that you would have got under the the null. So this is what p-value distributions look like for genomic experiments that are done well. So typically, you see a distribution like this where there's a spike near zero and then there's a flat distribution as you move out here towards one. So if you actually look at this and break it down into the different parts, this part near zero, these p-values that are really small, those are really the P-values that are coming from the alternative distribution. Because remember, the p-value is measuring the probability of observing a statistic more extreme under the permutations than the statistic that you got when you observed it. So if you observe a statistic that's very, very extreme, the number of null or the number of permuted statistics that will be larger than that is very small, and you'll get a small p-value. So this is the sort of the p-values that you expect to be coming from the cases that are not from the null distribution. And then under the null, these are the p-values you get, you get a flat distribution that goes out here to the right hand side. So turns out that a particular property of the p-value is that it's uniformly distributed, it's equally likely to be any value between zero and one if the null hypothesis is true. What does that mean in general? It means that even if you get a small p-value, it might be from the null distribution because there's an equal chance that it'll be any value between zero and one if the null is true. So this actually is a useful set of properties that can be used to estimate things like the false discovery rate that we we'll talk about when we talk about multiple testing. But the basic idea is that this distribution is a mixture of two distributions. There's a mixture of the p-values that come from the null hypotheses, and the p-values that come from the alternative hypotheses. And the null hypothesis p-values are supposed to be uniformly distributed. And the alternative ones should be pushed up towards zero. They should be skewed away from one. And so the p-values almost always go to zero with the sample size. That's another common misinterpretation of the p-value. Just because you got a really small p-value, it doesn't mean that the difference is huge. It could just be that your sample size is really large, and so the variability is small. Even if you have any difference at all, as the sample size gets big, the p-value will get small. The usual cut off that people use for calling p-values significant is 0.05. This is if you're doing only a single hypothesis test, but that number is basically just a made up number. So it could be any other threshold could also be used. I mean it's useful to have a standard, but don't treat this as sort of religious truth that 0.05 is the right way to tell if your p-value significant. And you should always report p-values in conjunctions with estimates and variances on the scale that's scientifically meaningful. P-values can be useful as a complement to that, as a way to sort of quantify statistical significance, as long as you pay attention to the properties of the p-values and interpret them correctly.