Hi and welcome back. This module is concerned with the central limit theorem. The central limit theorem is probably the most important theorem in probability and statistics. We need to really get a good foundation in this theorem before we move forward. In this video, we'll discuss a random sample, what it is. We'll talk a little bit about the law of large numbers and then we'll end with the central limit theorem. In the next video, we'll start doing a bunch of examples. Let's remember, for any random variable X, we need either the probability mass function p of k equaling the probability that X equals k for all possible k values, or we need the density function f of x to compute a probability or to find the mean or the variance, and we've looked at these formulas in previous modules. Now here's the question. What if we don't know how a random variable is distributed? This is the central question in statistics. You don't necessarily know the distribution of a random variable. So you'd collect data and you try and make some inference about that random variable. In particular, usually, you want to know something about the mean or the variance. This is called statistical inference. In future courses, you will be focusing on making statistical inferences about the true mean and the true variance of a population by using data, but before we can get to that, we have to finish laying the groundwork for the central limit theorem that tells us why it all works. Let's begin with the definition X_1 through X_n, and these are random variables and this is called a random sample of size n. If X_1 X_2 through X_n are independent and we already saw in the previous module what it means for two random variables to be independent. Now we're just saying the same thing for any pair of these exabytes, and each random variable has the same distribution. So if these two conditions are met, the X's are independent and each random variable has the same distribution. Then we say these are iid, independent and identically distributed. Now we're going to use estimators to summarize our iid sample. The mean and the variance are two quantities that will help us summarize our data. They're not the only estimate, as you'll learn about, but they're the ones we're going to focus on most in this particular module. For example, suppose we want to understand the distribution of adult female heights in a certain area. We plan to select n women at random and measure their height. So the height, the true height is the distribution. We're going to measure the height of these randomly selected women and we'll denote the height of the ith woman by X_i, and X_1 through X_n are iid, so they're independently chosen. They're independent and identically distributed because the true height distribution is what we're selecting these X_1 through X_n from. An estimator of Mu, so an estimator of the mean is denoted by X bar, and X bar is 1 over n sum from k equals one to n of X sub k. We're just taking these n random variables and we're averaging them together and we're calling that X bar. Now, what do we know about the expected value of X bar? Last time, recall if we have two random variables, X and Y, and we look at the linear combination aX plus bY, we saw that that's a times the expected value of X plus b times the expected value of Y. Now we're going to apply that to the sum from K equals 1 to n of X sub k, multiplied by one over n. We see that the constant the 1 over n can factor out, and then we can rewrite this as 1 over n times the sum k equals 1 to n of the expectation of x to k. So the expectation is a linear function and that's what this is saying here with the two random variables, so now we're just applying it to n, and this is going to be 1 over n times the sum k equals 1 to n of Mu because all the expected values of x_k, they're on Mu. Now we have n of these and so we get 1 over n times nMu so we just get Mu. So the expected value of X bar is the same, so with Mu it's the same as the expected value of each of the X bars. Now, what does the law of large numbers tell us? It's a fairly technical theorem, so I'm not going to write it down here, however, it says that under most conditions, if X_1 through X_n is a random sample and random samples of their identically distributed, so the expected value of each one is Mu, than X bar equaling the sum here converges to Mu in the limit as n goes to infinity, so it's not just that the expectation converges, we already know that. In the previous slide, we saw that the expected value of X bar is equal to the Mu. The law of large numbers says more. It says that the whole new random variable X bar converges to the Mu, let's see if we can illustrate this with an example. Suppose we have X_1 through X_n and each has a uniform distribution on [0,1]. I didn't say this, but let's assume they're independent. Assume the X_1 through X_n are independent, and we know. So X_i has a uniform [0,1] distribution that tells us that the density function is equal to 1 if x is between 1 and 1, and 0 for all other values. So if we graph this. Here's our 1 and we'll put 1 here, the density function is going to look, like that. Now, the law of large numbers tells us that if we observe values of the x's and we average them all together, we're going to get closer and closer to one half and in fact, in the limit, we will get one half. So this is part of what we mean in statistics when we have to collect a large enough sample, one single random variable is not going to be enough to give us a good understanding of the entire underlying distribution. All right, what about the variance now, so same question as we asked about the mean, so we're given a random sample X_1 through X_n and the variance is Sigma squared. You will recall we showed in the last video, if we have the variance of aX plus bY, we said this was A squared variance of X plus b squared variance of Y plus twice ab covariance of X and Y and if X and Y are independent. The covariance is zero. That tells us that the variance of ax plus by is a squared times, the variance of x plus b squared times the variance of y. Now, what we're going to do is we're going to apply that idea to a sum of random variables and the one over n that's our constant that's like the a and the b that's going to factor out as a squared. We get one over n squared times the variance of the sum k equals one to n of x of k. The same thing happens with our n random variables as happened with our two random variables, because it's a random sample, all the covariances are zero. This is going to equal one over n squared. The sum k equals one to n of just the variances. We know that one over n squared sum k equals was one to n of Sigma squared and look what happens. This right here is n Sigma. One of those ns will cancel. We end up with Sigma squared over n. This is important to understand what's going on. We've got the variance of x bar equaling Sigma squared over n, so as n increases, the variance becomes smaller. Remember, the variance is measuring the spread of the distribution. The variance of x bar is becoming narrower. The x bar is a random variable and so its variance is becoming smaller as n gets larger. We use estimators to summarize our iid sample, any estimator that we use, including the sample mean x bar is a random variable since it's based on a random sample. If you think about it, you've got x one is a random variable, x two is random variable, x three is a random variable. We're averaging all those together. That's a new random variable. X bar has a distribution of its own and this is referred to as the sampling distribution of the sample mean. This sampling distribution depends on the sample size n, we've already seen that with the calculation of the variance. It requires us to know something about the population distribution of the X_i and it also involves the method of sampling. Now, ideally, we want to sample so that we produce a sample of independent random variables. This question here is asking about the entire distribution of the x bar. We know the expected value of the x bar is mu and we know the variance of x bar is Sigma squared over n, but we don't know in general what the distribution of x bar is. We don't know in general the distribution of x bar. That's the question, what is the distribution of the sample mean? Well, here's a special case. If all of the x's are iid with each of the X_i's normally distributed with parameter mu , so mean mu variant Sigma squared, then this proposition tells us the distribution of x bar is also normally distributed with mean mu and variance Sigma squared over n and this I've illustrated with this green line. Suppose this is our population distribution, so this is our normal mu Sigma squared curve. Then, the purple line is going to be normal parameter, Mu, Sigma squared over 4. Notice how the spread, it's still bell-shaped, it's still normally distributed, but the spread is much more narrow than the green. When we get up to n equals 10, this is going to be normal, Mu, Sigma squared over 10. Again, notice it's getting narrower and narrower, as our n increases. This is one of the ways we're going to use statistics, and increasing our sample size, to help get a better understanding of the underlying distribution. We know everything there is to know, in the previous case. Because we know all those random variables were normally distributed, and independent, then the average is also normally distributed. You can calculate anything you want to calculate about X-bar in that situation. You could calculate any probability, you could calculate any moment, whatever you need, you could calculate. But what if the underlying population distribution is not normal? When the population distribution is not normal, averaging produces a distribution that is more bell-shaped, than the one being sampled. You can see that by running some simulations. A reasonable conjecture is that if n is large, a suitable normal curve, will approximate the actual distribution of the sample mean. That is indeed what the central limit theorem tells us. Here is, at long last, the central limit there. Let's let X_1 through X_n, be a random sample with mean, Mu, and variance, Sigma squared. Notice, there is nothing in here that says the underlying distribution has to be continuous or discrete. It doesn't matter. It applies in both situations. The central limits theorem says, if any sufficiently large, the average X-bar has approximately a normal distribution with mean Mu. The meaning of X-bar is equal to Mu, and the variance of X-bar is equal to Sigma squared over n. We write X-bar is approximately the distribution of a normal random variable with mean, Mu, and variance, Sigma squared over n. The larger the value of n, the better the approximation, because this n is going to be going off to infinity or at least getting larger. The variance, the spread of the data, is going to be getting narrower. A typical rule of thumb, is n greater than or equal to 30 is sufficiently large. But it does somewhat depend on the underlying distribution. Let me just draw a little quick picture to illustrate. Suppose we have a Mu here. I don't know, it could, whatever happens, maybe that's our underlying density function, F of X. Then, we go and we collect a lot of random samples. Our mean is going to stay the same, and the variance of this will be normal with mean, same Mu, and variance, Sigma squared over n. The larger the n is, the narrower the spread of the data will be. That's what the central limit theorem tells us. In this video, we talked about what a random sample is, and what iid means. We talked a little bit about the law of large numbers, and we talked about the central limit theorem. In the next video, we're going to work out some examples using and illustrating the central limit theorem. We'll see you then.