Okay, so in this lecture we're going to turn our focus to comparing proportions, rather than means, for two independent samples, so we'll consider an example of this inferential approach. Okay, so, for our example of comparing proportions in two groups, here's a new research question that we haven't considered previously. If we focus on elderly Hispanic adults, so by definition here we're talking about adults who are ages 80 and above, living in the United States in 2015-2016. We want to know if the proportions of males and females who smoked vary significantly. So is there a difference in the proportions of males and females who smoke in this specific sub-population, in terms of how many people smoke in the two groups.? So, our inference approaches, we're going to form a confidence interval for the difference in the two proportions of individuals who smoke. We're going to perform a chi-square test, to test the significance of the difference in the two population proportions, and again as in previous applications we're going to make sure to check our assumptions carefully. So, first approach we're going to try to form a confidence interval for the difference in these two proportions of males and females who smoke. So we analyze the data, and for the males, the proportion of males who smoke in this specific subpopulation is 0.565 or 56.5 percent of the males in this subpopulation smoke, but notice the sample size, that's a key difference in this case study relative to some of our other applications. We only have 16 male Hispanics who are ages 80 and above for this analysis. So, of the 16, 56.5 percent indicated that they smoked in the early age. Among the females we have a slightly larger sample but still only 32 individuals, and among the females we find that one quarter of the female smoke, or 0.25 is our proportion. So, the key aspect of this example is that we have small samples of males and females that we're using to try and make inference about this population difference in the proportions. So, we have our estimates, again the male proportion is 0.565 based on a sample of size 16, the female proportion is 0.25 based on a sample of size 32. Our best point estimate of the difference in sample proportions is 0.565 minus 0.25 or 0.315, so how do we interpret this best estimate? We would say that in 2015-2016, we would estimate that the percentage of all male elderly Hispanics who smoked was 31.5 percentage points higher than for all female elderly Hispanics. Notice that I didn't say 31.5 percent higher, that would be a different result, the difference in these two percentages is 31.5 or 31.5 percentage points, so we have to be careful with that kind of interpretation, when comparing two different proportions. Okay, so on the surface this seems like a big difference, but we want to try to make some formal inferences about this difference based on our small sample. So, let's evaluate some of the assumptions that we're making in comparing these two proportions. First of all, is the sampling distribution of the difference in sample proportions normal? Well, this is probably not going to be the case because we have such small sample sizes, in each of the two groups. So we can't really rely any longer on the central limit theorem, and assume a normal sampling distribution for the difference of the sample proportion, so, this assumption is unlikely and this is mainly a product of the small sample sizes. Second, do we have at least 10 outcomes either you're a smoker or non-smoker in each of the groups? The answer to this is no, because the sample sizes are so small, and in fact, the problem is with the males, where the sample size is so small. Number 3, are the two samples independent? Well in this case the answer is yes, because we're talking about mutually exclusive subgroups, males and females in the specific target population, and number 4 are the observations within each of the two subgroups independent? We're going to make that assumption for now, we're going to revisit the idea of dealing with complex sampling features in the hands a little bit later in this week, but for now we're assuming that the 16 and 32 observations in each group are in fact independent of each other. Okay, so let's proceed with forming a confidence interval. We calculate an estimated standard error of the difference in the proportions using the calculation that you see here, so our estimated standard error of this difference we would multiply the proportion for males by one minus that proportion and divide by the sample size. Then we would do the same thing for females, we would add that to the result for males, and take the square root of the whole thing, that gives us an estimate of the standard error of the difference in the proportions of 0.146. We then determined the few, we talk about again adding or subtracting a few standard errors from our point estimate. So we determine the few as being a critical value for a 95 percent confidence interval for difference in proportions, again we're going to use a critical value of 1.96. Then we add and subtract this margin of error from the best estimate of the difference. So we take 0.315 plus or minus that critical value 1.96 for 95 percent confidence interval, multiplied by 0.146, and again we're assuming now that the sampling distribution is normal in performing this calculation. The resulting 95 percent confidence interval for the difference in the population proportions is 0.027, 0.598. So again we see a case where the confidence interval for the difference in the proportions does not include zero which would correspond to our null hypothesis, that would mean that the proportions are not different from each other if their difference was zero, and based on this 95 percent confidence interval zero just doesn't seem like a possible value, but notice that the lower limit of the confidence interval 0.027 is approaching zero, so it doesn't seem like our evidence is particularly strong in this case, but using a 95 percent interval the fact that this interval doesn't include zero would lead us to believe that there is a significant difference, assuming that all the different assumptions that we're making actually hold. So, we want to check the robustness of that result, and given the concerns that we have about these assumptions mainly due to the small sample sizes in each group, we can use small sample techniques, and one approach that we can use in this case is to compute what's called an exact 95 percent confidence interval for the difference in the population proportions. This approach doesn't rely heavily on these assumptions, but it is much more computationally intensive, so you'll see some accompanying Python code to generate this kind of exact 95 percent confidence interval, and the calculations will just take a little bit longer, when you run this kind of analysis in Python. The resulting exact 95 percent confidence interval it looks pretty similar to the confidence interval that we formed based on these assumptions. So you can see the lower limit is 0.015 the upper limit is 0.574. So we really arrive at a very similar result, although that lower limit is approaching zero, zero still doesn't seem like a plausible value for the difference in the proportions. So it does seem like despite the small sample size, we have evidence of a significant difference in these two proportions. The evidence just as not as overwhelming again given that the lower limit is coming close to zero. Okay so approach two, lets consider a chi-square test, for comparing these two proportions. A null hypothesis in this case, is that the two population proportions are equal. So an equal percentage of males and females in this particular subpopulation smoke. The alternative hypothesis would be that males, and females have different population proportions who smoke. So, the alternative in this case like our similar applications allows the male proportion to be either greater than, or less than the female proportion, and this again leads to the need for a two-tailed test of the null hypothesis. We again would need more evidence against the null hypothesis in order to reject it, and we're again going to use the significance level of five percent, although you could change that for any given study if you wanted to make it lower. Okay, so let's revisit some of the assumptions, for the chi-square test, we assume that the expected counts under the null hypothesis in each cell of the two-by-two table that would be defined by this analysis, so we have males compared to females in terms of the proportion that smoke, and the proportion that don't smoke. Those are the four cells of the table that the data would define, and we will assume for the chi-square test that, under the null hypothesis where the two groups have equal proportions, we would expect to see five cases or five people in each cell of that two-by-two table. Okay, so, we in this case we see that this assumption is in fact met, if the overall sample rate of smokers is 17 out of 48, 48 was our total sample size 16 plus 32, and in total 17 of those 48 individuals are smokers. If we apply that overall proportion of smokers to both males, and females, we would expect to see about six of the 16 males who are smokers, and we would expect to see 11 of the 32 females as smokers. So in all four cells of this two-by-two table, we would actually have more than five individuals, so that assumption seems justified in this particular case. So that seems okay despite the small sample sizes. We are going to assume that within each of these two groups we have independent observations on this indicator of whether or not you smoke. So it seems like we're okay, barely, but we're okay with our assumptions in this particular case for the chi-square test. Okay, so, we run this chi-square test, and again you'll see accompanying Python code that allows you to generate these values. The resulting test statistic is chi-square equal to 4.554, the degrees of freedom for the chi-square statistic are one, which is the number of rows in that table minus one, times the number of columns in this table minus one, which is just one by one in the case of this two-by-two table, and the p-value for that chi-square statistic the probability of seeing a chi-square statistic, that large, or larger if the null hypothesis was true is about 0.033. So if we were using a strict five percent level of significance, we would in fact reject the null hypothesis in this case, because the p-value is less than 0.05, and this supports the conclusion that the population proportions of smokers are in fact different from each other. But again, this isn't overwhelming evidence, if we had initially selected a one percent significance level, and our type one error rate was 0.01 instead of 0.05, we would have failed to reject the null hypothesis. So again, we do have evidence at the five percent level, but this evidence is not overwhelming, and if we drop the significance level we again wouldn't have evidence of a significant difference. Okay, so again, we want to check the robustness of the result, like we did with an exact 95 percent confidence interval for the difference in the proportions, we're going to consider Fisher's Exact Z-test, as another small sample solution to this issue of comparing proportions in small samples. So when you run this Fisher's Exact Z-test the resulting p-value is actually 0.054, so that p-value does not provide very strong evidence against the null hypothesis, again allowing for these smaller samples. So we're close to that 0.05 cutoff if we were using a strict five percent level of significance. So again our conclusion would be that we don't have overwhelming evidence against this null hypothesis, when we're running an exact test for the small samples. So, what kind of conclusion would we make in this particular case study? We have weak evidence of a significant difference in the population proportions of smokers, for elderly male, and female Hispanics living in the US and 2015-2016. Remember, we always want to make our target population clear, when making these kinds of conclusions. So overall we have weak evidence of a difference, in these proportions of people who smoke among males, and females in this population, but we can definitely include the caveat with that interpretation that we had very small samples to work with and generating these estimates, and estimating the difference in the proportions, so overall we had limited statistical power to detect this difference. We can see in most of our approach is we still have weak evidence of a difference despite the small sample sizes. So some notes to keep in mind, if the same difference in proportions marries a big difference, 31.5 percentage points, if that same difference were to emerge with larger sample sizes in each of these two groups, we more than likely would find it significant regardless of what significance level we were using, 0.01, 0.05, it wouldn't really make a big difference. So, if these proportions held up in reality, and they still existed with larger samples, this is a pretty big difference. This is what we would call a very large effect size in practice, so in practice this is a big difference in proportions, and this is where we have to carefully weigh statistical significance, versus practical, or real-world significance, and this overall would still be considered a big difference, despite Fisher's Exact Z-test, or some of the weak evidence that we had, this is a pretty notable difference in the proportions who smoke. With the small samples though, we need to allow for a larger degree of uncertainty in our analysis due to these small samples, so we allow for more sampling variability, given the small samples in each of these two groups, and that's what we did with these robust tests that we perform. So, overall weak evidence, but again, with larger samples more than likely this difference would be significant.