To better understand the relationship between sample and population, let's consider two simple examples. Here are the distributions of blood types in the US population. You can see the common blood types include Type A and Type O, with less common blood types including AB and B. Let's assume now that we take a sample of 500 people in the United States, record their blood type and display the sample results. If you look carefully you'll notice that the percentages of each blood type from our sample are slightly different from the percentages of the population. But I'm sure this doesn't surprise you right? I mean, since we took a sample of just 500 individuals, we can't expect that our sample will behave exactly like the population. But if the sample is random, and this one was, we expect to get results which are not that different from results of the whole population and this is what we found. Yet another random sample of 500 individuals, reveals results that are slightly different from population figures and also from we got in the first sample. This very intuitive idea that sample results change from sample to sample, is called sampling variability. Here's another example to help better understand the relationship between sampling population. This example is based on the heights among the US population of all adult males. As you can see it follows a normal distribution with a mean of 69 inches and a standard deviation of 2.8 inches. Let's say that a sample of 200 males was chosen and their heights were recorded. These are the results of sample one. The sample mean is 68.7 inches, and the sample standard deviation is 2.95 inches. Again, note that the sample results are slightly different from the population results. The histogram we've created for the first sample, resembles the normal distribution of the population. However, the sample mean in standard deviation is slightly different from the population mean in standard deviation. Let's take another sample of two hundred males displayed here in sample two. The sample mean is 69.065 inches and the sample standard deviation is 2.659 inches. This example, again, demonstrates sampling variability. While the sample results are pretty close to the population results, they're slightly different from the results we found in the first sample. In both of these examples, we have numbers that describe the population and numbers that describe the sample. A parameter is a number that describes the population, and a statistic is a number that's computed from a sample. Parameters are typically unknown, because it's impractical or even impossible to know exactly what values a variable takes for every single member of a very large population. Statistics are computed from samples, and each sample of a population is going to have different statistics. The statistics of different samples of a population vary. This is due to sampling variability. >> So far, we've been making distributions based on individual variables. Theoretically, we can create distributions from means or proportions taken from multiple random samples drawn from a population. This is the big idea behind inferential statistics. >> As an example, suppose we've selected 30 separate random samples rather than only two. And each of the 30 random samples have 500 individuals drawn from the population of US adults. The first sample has a mean height of 69 inches. We could create a bar graph and plot that mean for our first sample on the graph. If our second sample had a mean height of 68.5 inches, we'd add that to the graph. As we continue to plot the mean height of each random sample, a pattern would begin to emerge. Notice how there are more sample means at 69.25 inches than at any other length. Notice also how, as the length becomes larger or smaller, there are fewer and fewer sample means. This is a characteristic of the sampling distribution, whether we're measuring the mean of a quantitative variable or the proportion of categorical variable or any other sample statistic. That is, as we draw more and more samples, the distribution of the sample statistics will become more and more normally distributed. This result is know as the Central Limit Theorem, which states that as long as adequately large samples and an adequately large number of samples are drawn from a population, the distribution of the statistics of the samples, whether of mean, proportion, standard deviation, or any other statistic, will be normally distributed. Our projects ultimately rely on only one sample. However, if that sample is representative of a larger population, inferential statistical tests allow us to estimate with different levels of certainty parameters for the entire population. This idea is the foundation for each of the inferential tools that you'll be using to answer your research question.