[MUSIC] In the previous section, we derived our first sampling distribution of the sample mean x bar. Now keep in mind that the sampling distribution is simply a probability distribution of some descriptive statistic. In our case the sample mean x bar. So relating this back to our work in week 2. Where by we derive some simple probability distributions i.e., we have sample space of some random variable x and to each value we attribute it its probability of a current. From that point we were then in a position to work out the expectation of that random variable. And remember the technique for doing so was to take a probability weighed average. i.e., take each value of x, multiply it by its probability of occurence, and then sum these across all values of the variable. Well, for our sampling distribution of x bar, we don't deviate from this. We have a random variable, the sample mean. Taking different values depending on which random sample is observed and we have the probabilities of occurrence for each. So if we do, for example, 3.5, the smallest observed value of x bar multiplied by its probability of occurence of 1 over 15 and then proceed to do this across the entire distribution, we will find that the expectation of x bar is equal to 6. Now you may recall 6, in terms of thousands of pounds, represented the population mean. Now this is a fascinating and very useful result. Which says that the expected value of the sample mean is equal to the true population mean, i.e., the true value of the parameter of interest. Now this is always going to be the case when we take some simple random sample from some wider population. Now remember the correct interpretation of an expected value. We thought of it as a long run average. So in this word of sampling distributions, we should think of it as follows. Suppose we took repeated samples of the same size from the same population, then inevitably from one sample to another, we would tend to get different members forming that random sample. And hence when we calculate a descriptive statistic such as the sample mean as we are considering here. We clearly see that different samples lead to different values of that sample mean. So hence there is variation in this sample means which could be observed. But what this results that the expectation of x bar is equal to Mu, means is that on average, our sample mean is equal to the true population mean, i.e., the very parameter we're trying to estimate. Of course, we should stress the on average. This doesn't mean for any specific observed sample, our sample mean is equal to the true value. In the example from the previous section, that only would have occurred in one special case if we observed individuals A and D. So clearly on average is different from saying we get this result every single time. So we know that any point estimate we get, any observed sample mean, may or may not be equal to the truth. On average we are right. But, in any specific sampling situation, there is a risk of sampling error. Whereby, even though we may have a simple random sample free of that selection bias. There is a risk, that by chance, our sample doesn't fully represent the population. And hence, the characteristic of the sample, here x bar, the sample mean deviates from the true population mean. But nonetheless the fact that the expectation of x bar is equal to mu is a very powerful result for us. So now let's consider the concept of sampling distributions more generally because, of course, in that previous example, we had a very simplified case where the population only consisted of six members. So you may recall, we've introduced the normal distribution. And at the time I said it was arguably the most important distribution in statistics for a variety of reasons, one of which was that the normal distribution can represent many naturally occurring phenomena. So let's imagine we are interested lets say in the heights of human beings. And I think a normal distribution would adequately capture and hence adequately model the two distribution of heights which we observed in the real world. Now of course humanity has a very large population size indeed. We don't even know exactly. How many people there are alive on the face of the earth. We have an approximation of around about 7 billion but it's extraordinary to think that we don't actually know how many people there are precisely. But nonetheless, we have a very large population. Imagine we would like to know the average height of all human beings. Clearly, it would not be feasible to measure the heights of every human being on earth. I don't have the time, patience, or money to undertake that exercise. So inevitably, we would have to take, let's say, a random sample drawn from our population and use the characteristics of the sample to estimate the corresponding characteristics in the population. But we know different random samples will lead to different constituent members of those samples, and hence, the value of, let's say, our sample mean would vary from one sample to another. Well, the good news is, when we are sampling from a normal distribution, one can actually show, which means we won't actually derive it here, but just take my word for it that the true, theoretical probability distribution of x bar also follows a normal distribution. Now we know a normal distribution has two parameters. Its mean and its variance. So distinguish between X, let's say the height of human beings which we will model as following a normal distribution, with an unknown mean mu, and also an unknown variance, sigma squared. But from this, let's suppose we take a random sample of size N. So what would be the theoretical distribution of the sample mean x bar? Well, it can be shown that x bar would also follow a normal distribution, whose mean is mu, i.e., the same as the mean of that wider population. And a variance of sigma squared divided by n. So, here, the variance of this sampling distribution is sensitive, i.e., it depends on the value of our sample sum. So we see that the sample size is in the denominator of the variance of x bar. Which means as the sample size gets bigger, as we get an increasingly larger sample, the variance of x bar becomes smaller, as we might expect. Because if you have a larger sample size, that means you have more observations, more information, and hence one would expect that this will lead to more precise, more accurate estimates of the population mean. So if we could perhaps visualize some sampling distribution of x bar. Whereby we simply vary the value of the sample size N. So let's consider an example where, suppose the true population had a normal distribution with a mean, mu, equal to 5, and a variance sigma squared equal to 1. So as we vary the sample size N, what would the sampling distribution of x bar look like? Well on screen, you'll see a few examples where we vary the value of the sample size N, and note that as the sample size gets bigger, the variance of the sampling distributions becomes smaller. But note the mean of the distribution of x bar is simply mu, i.e., the true population mean, which in this instance, let's say is equal to 5. And that mean is not sensitive to the sample size. So all of these sampling distributions are centered on 5, the true mean mu. So remember that idea of central tendency, that measure of location, on average what value do we get for this variable? So this goes back to the earlier result that the expectation of x bar is equal to mu. Meaning, on average,in the long run, the value of our sample mean is equal to the true mean of that population. But remember, only in the long run. We say that the sampling distributions have variances. True, those variances become smaller as n becomes larger but variances nonetheless. So going forward, I'd like you to envisage any sample mean that you may calculate based on your observed set of data. I like you to think of this as a random drawing from the respective sampling distribution. So if you have quite a small sample size, you can see the much greater variation in the distribution of x bar, and hence you are very likely to have a sample mean which is far below the true mean, just as you are to have a sample mean above the true mean. But as the sample size gets larger, you can see the variance of these distributions is becoming smaller, and hence you are more likely to get a value for x bar very close to the truth. So if one thinks in terms of tradeoffs, this is as perhaps we might have hoped for. Namely, the advantage of having a larger sample size would lead to greater precision in your estimation of the population mean. Because there's less variation that you're likely to observe in your observed sample mean. However, in terms of trade-offs, the larger sample size is going to cost more time, money and effort to obtain. So one has to consider, does this gain in precision, this gain in accuracy, does this justify the additional expensive, the time, money and effort to collect that larger sample in the first place? Well there's no right or wrong answer to this situation. Of course it's somewhat subjective in how much weight you place on the relative advantages and disadvantages. But nonetheless I'd like you to be clear about the trade off that we face. So in summary, if we don't know the value of a population mean, we can estimate it with a sample mean. If the only information we have is the observed set of data, then that's the best we can come up with. We know that on average, if we take simple random samples and calculate the sample mean based on what we observe, on average our point estimate is going to be right. But of course we also know there's a risk of sampling error. Now, the magnitude of that sampling error will depend on that sample size n. And as the sample size gets bigger, the likely magnitude of that sampling error is typically going to become smaller. Nonetheless there is this inherent uncertainty in our estimation, and we need to go on and look at quantifying that uncertainty, which we'll be able to do using something called confidence intervals. [MUSIC]