Okay, so now we're going to start getting into some examples of these different statistical procedures that we've been learning about so far in this course, and we're going to start with some examples of making descriptive inferences for single variables, using confidence intervals. So, one way we can make inference in running these procedures is by generating confidence intervals for parameters of interest. We're going to see some examples of that in this particular lecture. So, example number one, generating confidence intervals for proportions. So, here's our research question and again, we want to provide a very clear detailed research question that hits on all four of those aspects that we talked about when we discuss formulation of good research questions. So, our research question for this example is, what proportion of non-Hispanic African-Americans age 18 plus, in the United States in 2015-2016, so making that target population very clear, had systolic blood pressure greater than 130 millimeters of mercury? So, this proportion would generally quantify higher blood pressure, this indicator of having systolic blood pressure, greater than 130 mmHg, would generally indicate hypertension. We want to see what proportion of this specific target population, has systolic blood pressure greater than this threshold. So, the inference approach that we're going to demonstrate in this example, is computing a 95 percent confidence interval estimate, recall the lower and upper limits for this population proportion. Now, we want to be very clear right up front here, when you generate a confidence interval estimate, the confidence interval is not for an estimate, it's for a population quantity, is for a parameter that we're trying to estimate. So anytime that we say we're generating a confidence interval, it's for a parameter of interest or a population quantity, it's not a confidence interval for an estimate. Now we're estimating the confidence interval but the actual interval is for a population quantity or a population parameter. So, step one in this kind of approach we want to estimate the population proportion, so we want to calculate our point estimate that's going to be used to form the confidence interval. So, first of all, think about the information we need for a proportion. The number of black respondents with non-missing data on the first systolic blood pressure measurement in the NHANES dataset is 1,135, so this is the size of our sample of black respondents with valid data on systolic blood pressure. So, our best point estimate assuming again a simple random sample of black adults, that sample proportion is 465 divided by 1,163 or 0.4097. So, there were 465 black adults who had systolic blood pressure greater than or equal to 130 mmHg. So, the resulting proportion we just calculate what fraction of the sample size had that particular hypertension value and that fraction was 0.4097. So, our estimate is that 40.97 percent of all such black adults in 2015-2016, had systolic blood pressure greater than 130 mmHg. That's our point estimate of the population parameter of interest, 40.97 percent of all such black adults that fell into this category. So, that's step one is calculating our population estimate. Step two is computing the estimated standard error of that population estimate. So, recall that the standard error is the square root of the sampling variance of that sample proportion, or in other words, this is the standard deviation of the sampling distribution, recall from course one, if all possible sample proportions like this had been estimated from repeated samples of size 1,135 black adults. So, how variable would these different estimates of the proportion be if we drew repeated samples of the same size 1,135, and to get the standard error we take the square root of that sampling variance or again talk about the standard deviation of the sampling distribution. So, we calculate our estimated standard error as follows, using the estimate of the population proportion and using the sample size. So, the resulting standard error is 0.0146, and again that's our estimate of the standard deviation of the sampling distribution if we had drawn many, many simple random samples of size 1,135 and calculated that same proportion in each of those repeated samples. But this is the beauty of this kind of approach, is we only need one sample and we can estimate the standard deviation of the sampling distribution. Step three, we form the confidence interval, so given our best point estimate and the margin of error, we can write down the confidence interval. Now, how is the margin of error form, recall that we take a few estimated standard errors in either direction around the point estimate. Generally, with a large sample size like we have here 1,135, to form a 95 percent confidence interval, we would use a multiplier of the standard error of 1.96. Some people round it up to two, we're going to use 1.96. So we say, plus or minus two standard errors essentially, that gives us roughly a 95 percent confidence interval. So, using the technically more precise 1.96 value to form this 95 percent interval, the lower limit is formed by taking the point estimate minus the multiplier times the standard error, we get 0.3811. The upper limit is formed by taking that same point estimate, and then adding the multiplier times the standard error, that gives us 0.4383. That defines our confidence interval. So, let's make a couple more statements about this confidence interval, the 95 percent confidence interval for the population proportion remember, not for the estimate, but for the population proportion of non-Hispanic African-Americans age 18 plus in the United States in 2015-2016, with systolic blood pressure greater than 130 mmHg, is written as 0.3811, 0.4383. So, we write out the two limits of that confidence interval. Now, what this means, we say 95 percent confidence, this is a 95 percent confidence interval. To be perfectly precise, 95 percent of intervals formed in this way, the way that we just formed it, the point estimate plus or minus the margin of error, are expected to cover the true population proportion, assuming that the NHANES is a simple random sample. So, again under the idea of repeated sampling, if we were to draw these samples, and calculate these confidence intervals in the same way for each of those samples, point estimate plus or minus the margin of error, we were to do that over, over, and over again to form a whole bunch of intervals, 95 percent those intervals would be expected to cover the true population proportion that we're trying to estimate. That's how we would interpret this confidence interval. Our inference says, for example, if the hypothesized proportion was 0.35, say the people who were funding this study, we're speculating that the proportion of non-Hispanic African-American adults in the US was actually 0.35, a 95 percent interval suggests, that that hypothesised proportion is not a plausible value. Why? Because that hypothesised proportion does not lie within the limits of that confidence interval. So, we don't view that as a plausible value, using this 95 percent level. So, what this really tells us is that we have evidence against that hypothesised proportion, and in fact, we have evidence that the proportion of African-American adults who have systolic blood pressure greater than 130 is actually higher than 0.35. Okay, so that was an example of a proportion, let's think about an example of forming a confidence interval for a mean. So, different descriptive quantity, based on one varible. So, here's our new research question, what was the mean systolic blood pressure for non-Hispanic African-Americans age 18 plus in the US in 2015-2016? So, before we were estimating the proportion that had systolic blood pressure greater than 130 mmHg. Now, we want to estimate the mean for this target population. So, same inferential approach, we're going to provide a 95 percent confidence interval estimate or the lower and upper limits for the population mean, not for the estimated mean but for the population mean. So, following the same steps, first step, we estimate the population mean. The number of black respondents with non missing data on the first systolic blood pressure measurement again, 1,135 just as before with the proportion. Our best point estimate, assuming a simple random sample of black adults, that sample mean is 128.252. So, we calculate the mean of all those systolic blood pressure values, and the resulting estimate is 128.252. Again, assuming a simple random sample for now, we're going to revisit the use of sampling weights later. Our interpretation would be that our estimate of the mean systolic blood pressure for all such black adults in 2015-2016 was one 128.252 mmHg. Step two, we calculate the estimated standard error of that estimated mean. So, same approaches we follow with the proportion, we want to estimate the standard deviation of the sampling distribution, if all the possible sample means in repeated samples of 1,135 were obtained. So, the sample standard deviation of the 1,135 pressure measurements is 19.958, remember that's the standard deviation of the actual values of the blood pressure measurements. So, how variable are those different values. We calculate the variance, take the square root of that variance, that gives us the sample standard deviation. So, that's the standard deviation of the actual values on blood pressure. Remember, we're trying to estimate the standard deviation of the sampling distribution of all possible mean estimates, if we had drawn repeated samples of the same size. So, to calculate that estimated standard error or the standard deviation of the sampling distribution of repeated estimates of means, we take the standard deviation of the values of blood pressure from our sample 19.958, and divide by the square root of that sample size 1,135. That's how we estimate the standard error. The resulting value 0.592 mmHg. So, that's the standard error that we use to form the confidence interval. Step three, we form the confidence interval again, point estimate plus or minus the margin of error, for a 95 percent confidence interval, we're again using this multiplier of 1.96, lower limit we take the point estimate of the mean minus the multiplier times that estimated standard error, that gives us 127.091. Upper limit we take the point estimate add the multiplier times the standard error, that gives us 129.413. We have a 95 percent confidence interval. So, a 95 percent confidence interval for the population mean, systolic blood pressure of this target population is 127.091,129.413. So, thinking a little bit more about what they implies, again same interpretation, 95 percent of intervals formed in this way, and repeated samples of the same size would be expected to cover the true population proportion. So, if we do this over and over again, we're going to cover the true population mean, 95 percent of the time in repeated samples. So, thinking again about inference, if the hypothesized mean was 128 mmHg, that was our hypothesis. The 95 percent confidence interval that we formed suggests that 128 is a plausible value for the mean. Why? Because the hypothesized mean lies within the limits of the confidence interval, unlike the case of the population proportion. So, 128 is certainly a plausible value allowing for sampling variability. So, we're going to talk about alternative inferential approaches, we're going to talk about hypothesis testing, where we either reject or fail to reject some null hypotheses about specific values of population means or proportions of interests at a certain level of significance, so a five percent level would correspond to these 95 percent intervals. We'll see examples of that coming up in another lecture. In this lecture,we've been focused on confidence intervals and the advantage of intervals is that they provide a range of plausible values for a descriptive quantity of interests, like a population mean or proportion. So, we're not just rejecting or failing to reject a particular hypothesis, but rather we get a range of plausible values under a certain level of confidence, and we've seen that in the two examples so far.