In this video, we will learn with a simple example what the central limit theorem for sample means is. This theorem will be immensely helpful to learn a concept called hypothesis testing later this week. Hypothesis testing is a concept that is used in almost every business analytics technique. Suppose that there are eight people on a team and these are their ages. If we take all possible samples of size 2 in each, we will get 8 to the power of 2, 64 possible samples. These samples are taken with replacement. In other words, every time we pick a number for each sample, we pick from eight numbers. For example, in the first sample, we have 54 from this list of eight and the first sample is not complete yet. For the second value in that first sample, we may end up picking the same number again. That's what we mean by replacement. The second sample could be 54 and 55. One other sample could be 55 and then 54 right here. There'll be 8 to the power of 2 or 64 possible samples with replacement where each sample has size 2. Let's see what happens if we look at the histogram of this population. Graph, histogram, simple, and population. You get a histogram that is obviously not normally distributed. For all the 64 samples, if we get the means, for example, for the first sample of 54 and 54, we have a sample mean of 54, second sample of 54 and 55, the sample mean is the average of those two so 54.5. We have 64 sample means. If we plot a histogram for those 64 sample means are all possible sample means where each sample has two values. We get a histogram that is reasonably, normally distributed. The reason for that is we have more values now. In the population, we have only eight values. In the histogram of all possible sample means, we have 64 values. Central limit theorem for sample mean says that if the population is not normally distributed, and if we take all possible samples of size, say 2, that distribution of sample means tends to be normally distributed. However, there is no guarantee of normal distribution, like you see here. In the mini tab example, we got lucky with sample size of 2. Here, not so much but let's see what happens if we increase the sample size to 10. This is the distribution of all possible samples of size 10 in each from a population of size of at least 10, that is, and the distribution of sample means tends even more to be normally distributed. If we take all possible samples of size 30 in each, the distribution of sample means is guaranteed to be normally distributed. Somebody mathematically proved this for sample size of 30 or more, so the magic number is 30. When the sample size is at least 30, we generally call it a large sample. When sample size is less than 30, we generally call it a small sample. The reason why there is normal distribution of sample means when the sample size increases is that there are a lot of values. The more the values, the more the chance of seeing a normal distribution. There are a couple more things that we need to understand about the central limit theorem for sample means. Let's do that with an example. Say we have a population of 1,000 people. The average weight of all those 1,000 people is say, 150 pounds population mean that is, and the population standard deviation is say, six pounds. If we take all possible samples of 50 people in each with replacement that is, we'll get 1,000 to the power of 50. The population size to the power of sample size sample means, 1,000 to the power of 50 sample means. If we plot a histogram for those 1,000 to the power of 50 sample means, given the fact that the size of each sample is at least 30, the histogram of all those sample means is guaranteed to be normally distributed. Central limit theorem also says that the mean of all those sample means is the same as the population mean. The last thing is that the standard deviation of 1,000 to the power of 50 sample means that is, is the population standard deviation divided by square root of the size of each sample. In this example, it would be 6 divided by square root of 50, which is approximately 0.85 pounds. One thing to note here is that if the population itself is normally distributed, and if you take all possible samples of whatever size, even if it is less than 30, the sample means will be normally distributed and that's what this note says. If the population is normal, the condition of sample size being at least 30 is not necessary. Where are we going with all list? We'll see in a concept called hypothesis testing.