Consider a cohort of patients with advanced lung cancer, and we wish to test the efficacy of a new chemotherapy procedure. So we design an experiment where we split the cohort of patients in half. Half of the patients are treated with the standard chemotherapy procedure, and the other half are treated with the new procedure. And we measure their survival time in days. And we'll collect data that looks perhaps something like this, and this is a data set that's been used in other contexts, and you can find it on the web if you look around. So you'll have survival days for those patients who were subject to the test procedure and survival days for those in green that were subject to the standard chemotherapy procedure. And you can take the mean of each of these sets. And so the mean of the blue values in the test procedure was 128.3 days, and the mean of the green values for the standard procedure was 115.9 days. And then you can subtract the green from the blue, and you get a difference of 12.4 days. And you think, well, maybe the test procedure is indeed successful, and that these patients are living longer. But a skeptic might say, well look, that might happen just by chance. How can you know that that was really due to the difference in the test procedure? So you can ask is this number significant? How likely is it that we would have obtained this result purely by chance. And so what I'm gonna do is show you the classical approach to making this kinda statistical inference and declaring significance. And then, I wanna show you a different approach that I think is more useful day-to-day for a data scientist. So the classical method is to derive what we'll call the sampling distribution. And in order to do that, we need to make some assumptions about the population, in particular that the number of survival days follows a normal distribution. So there's a Gaussian distribution, this is the bell curve that we're likely all familiar with. And that the variances of the two sets of patients are the same, so you're not going to get a wider spread of survival days due to the new treatment versus the standard treatment. Or at least that the two sample sizes, the cohorts, are the same, which is true in this case. If the sample sizes are the same, then you can show that the statistics that we'll use are pretty robust with respect to different variances, okay. Then we construct a t-statistic that looks like this. It's some derived statistic, And some hypothesized value for that statistic, and you compute their difference, and you divide that by the estimated standard error of that statistic, okay. And so in our case the statistic we're interested in is the difference in the means, and the hypothesized value under the null hypothesis. This is the skeptics point of view that there is no difference between these, and what happened purely by chance is that it's zero. The difference in these two means is zero, okay. So the intuition for this denominator here is that you're asking how many standard errors off of the hypothesized value is this result. And the reason we need to talk in terms of the units of standard error is that allows you to compare very different populations and very different experiments in a common way. So you sort of normalize the values, if you will, okay. So fine, so we have this null hypothesis where T bar- S bar = 0. And we know that the T bar- S bar is our random variable derived from the sample, so that therefore, it has a sampling distribution. So any kind of statistic you derive from a sample, you can reason about its sampling distribution, all right. And we're gonna compute this t-statistics that has this formula that we talked about, and so therefore, we need to figure out what this denominator is all about. We need to see what is the estimated standard error of T bar- S bar. Well, the term standard error is just another name for the standard deviation of the sampling distribution of the T bar- S bar. Okay, so I think one of the problems in statistics that I at least find, is that the English doesn't always map very well to the concepts. So you end up with a lot of ostensibly synonymous terms being used to describe subtly different things, and it makes it a little more confusing, to me at least, than it needs to be. And this is perhaps one of those examples where the standard error is just another name for the standard deviation of the sampling distribution, and you use the term sampling distribution when you're talking about a derived statistic from some sample. Okay, so that means we need to figure out what is the sampling distribution of T bar- S bar?