So in this next set of lectures, we'll talk about hypothesis tests for comparing proportions and incidence rates between two populations and a logic will be very similar to what we did when we compared means between two independent populations using the two-sample t-test. It's just the mechanics will change depending on what we're doing comparison wise. So let's first talk about comparing proportions between two populations. The first approach we're going to look at is analogous and almost identical to the two-sample t-test for comparing means and we'll call it the two-sample z-test. Very imaginative name as you can see, and the reason we call it the z-test is because all the p-values when we compute them when we have large enough samples will come from the normal distribution which is frequently referred to as the z-distribution as you know. So upon completion of this lecture section, you will be able to estimate and interpret a p-value for comparing proportions between two populations using the two-sample z-test approach, and explain why even though there's three measures of association for which we estimate create confidence intervals, the risk difference, the relative risk, and the odds ratio, we only need one p-value to answer questions about all of them. So let's look at our go-to example when kicking off comparisons of binary outcomes between two groups. We'll look at our random sample of a thousand HIV positive patients from our citywide clinical population and we'll look at them by baseline CD4 count at the time of treatment. As we've seen over and over again, that in the sample, those who actually had lower CD4 counts upon enrolling the study had higher response to treatment than those with greater CD4 counts, and we've seen that that difference was. So what we'll do to get started is look at this two-sample z-test and it's going to look very familiar because it's analogous to the two-sample t-test for comparing means of continuous data and in fact the approach is exactly the same with slightly different inputs. We'll first specify our two competing hypotheses, the null and alternative. Assume the null to be the truth, and then we'll compute how far our sample estimate of this truth is from what is expected under that null assumption, translate that distance into a p-value, and make a decision about our competing hypotheses. So, in this example, the two competing hypotheses, and any example where we're comparing binary outcomes between two populations through two samples, there's a multitude of ways to express the competing hypotheses. At the most basic level, the null hypothesis for this example is that the, and the population of HIV positive patients with the respect of CD4 counts less than 250 and greater than or equal to 250, the true proportion responding to therapy is the same versus the alternative that the true proportion who will respond at the population level is different. The proportions are not equal. We can express from the null and the alternative these in terms of the risk difference. If the proportions are equal, the risk difference is 0. If they are not equal, the risk difference is not 0, and in terms of our favorite ratios, the relative risk and odds ratio, if the underlying proportions are equal at the population level. The respective ratios are both 1 or if they're not equal then the respective ratios are both not equal to 1. So, let's do this based, we're going to do this and build it around the risk difference however and we can set up the two competing hypotheses just like we did on the previous page, and the way we're going to measure distances between our observed risk difference than what we'd expect it to be under the null, and under the null we'd expect the difference in proportions as we just said to be 0. So our distance measure, well it looks involved mathematically you all recognize this from what we did before and this is just like we did with the t-test, we take our observed difference in proportions divided by its estimated standard errors and figure out how far our result is from zero, what we'd expect to be under the null in units of standard error. So, for our data, the distance between the difference in sample proportions and zeros in standard errors comes out to be, and I won't go through this computation for the standard error again, we showed how to do it in the lecture eight on confidence intervals for comparison measures, but you may recall it turned out to be 0.025. If you want to verify, you can certainly go through and do it again with the formula on the previous page, but you've already seen this before. This distance measure standardized is 3.6 standard errors. So our sample difference in proportions is 3.6 standard errors above 0, and again 0 is what we'd expect the true difference in proportions to be under the null hypothesis. So, now we've got this distance and what we have to do is translate the distance into a p-value, by comparing it to the distribution of sets differences under sampling variability when the true population level difference is 0, and that is to say even if the truth were a difference in proportions of 0, we'd expect the estimates of that truth to vary about 0 in the pretty well-known normal fashion. We went to see where our estimate, the one that is 3.6 standard errors above 0, where that would fall relative to other estimates when that true difference is 0. We'll get the p-value and compare it to the preset rejection level or Alpha level and for our purposes in most of the research world this continues to be five percent or 0.05. So we have a result that it's 3.6 standard errors above the expected difference in proportions of 0 under the null hypothesis. So now we have to ask, how likely is it to get a result like we did in a sample difference in proportions of 0.09 the true underlying population difference is 0? So again, we need to appeal the sampling distribution of our difference in sample proportions, and we know from the central limit theorem, that if the true difference in proportions is 0 then the distribution of our estimates, so 0 around 0 will behave in a normal fashion with most of the estimates being relatively close to 0 within plus or minus 2 standard errors. So what we want to do is translate that distance of 3.6 standard errors into a p-value, which is the proportional results that we could get just by chance alone that are as far or farther than 3.6 standard errors from the assumed truth of 0. In other words, it's the probability of getting a difference in sample proportions of 0.09 which we observed in the study or something more extreme with the true population mean difference is 0. I won't go through the details here, now you should be pretty comfortable with this. I'm using the p-norm function here because we're looking at probabilities under a normal curve and you can use one of two options very analogous to what we did with the pt function and in either case, the result of getting something as far or farther than 3.6 and this isn't quite drawn to scale, my picture here. Being in these tails is on the order of 0.0003. So very low. So the p-value is very small. How do we interpret that? Well if there were no difference in the proportions responding to treatment by the two CD4 count populations, then the chance of observing the sample difference in proportions of 0.09 or something more extreme is less than one in a thousand. In fact, it's equal to 0.0003, etc. So, certainly, because this comes in at less than 0.05, we're going to reject the null that the two underlying proportions are equal in favor of the alternative that they're not equal. Another way to say it, we're going to reject the null that the true difference in proportions is 0 versus the alternative that that difference is not 0. So we ruled out no difference in population proportions as a possibility for the underlying truth and this is of course consistent with our 95 percent confidence interval, not only for the risk difference because that didn't include 0, but for the ratios as well, relative risk and odds ratios because neither included 1. These two competing hypotheses again can be expressed in terms of any of the measures of association we use to compare binary outcomes between two populations. All are equivalent statements about the corresponding null and alternative hypotheses. So if one of these is true, either the null or the alternative, the corresponding null or alternative for all the others is true as well. So the resulting p-value from the two sample z-test will result in the same decision regardless of how the null and alternative hypotheses are expressed. So we only need one p-value for testing all of these because they are all expressions of the same thing. So look at our randomized trial, our favorite seminal study on looking at HIV mother to infant transmission in pregnant women with HIV randomized to receive AZT or placebo. We already know the end result here is that those mothers who received AZT were much less likely to have children who were born or develop HIV. Only seven percent among the AZT group compared to 22 percent in the placebo group. We know that not only was this difference is large, risk difference scale and relative risk scale, but it was statistically significant as well on all our confidence intervals for our three favorite measure of association. None of them include their respective null values. So if we were to express these that the way we did before we could either express them as the underlying population proportions being equal, or different, or in terms of any of these measures of association. So again, we'll set up the two competing hypotheses, we'll assume that the proportion of children who would be born with HIV at the population level where all mothers given AZT or not treated with a placebo, is the same and that the true difference is 0 and we'll figure out how far are we. So our observed difference of negative 15 percent is from 0 in terms of standard errors. The standard error of this difference, and again we calculated this previously, is 0.036 or 3.6 percent. So our difference is 4.2 standard errors below what we'd expect it to be under the null hypothesis. So this resulting p-value is less than 0.001, actually, you could set this up in R if you want to test that or check that out, but I won't show that again here, and our interpretation again is, and our p-values always interpreted again under the assumption that the null is true. So we said if there is no difference in the proportion of HIV cases among children and mothers given AZT and mothers given placebo at the population level, that a chance of getting a sample difference in proportions of negative 0.15, negative 15 percent or something more extreme, is less than one in a thousand. Based on a cutoff a 0.05, we will reject the null hypothesis in favor of the alternative and conclude that this difference is statistically significant. If that's all we knew, we couldn't comment on whether the results were scientifically useful or even favorable for AZT, but if we had the estimate and confidence interval or confidence intervals for the different measure of association, we'd get the same information about statistical significance and more substantive data summarization as well. So, in summary, the two-sample z-test provides a method for getting a p-value for testing two competing hypotheses about the true proportions of a binary outcome between two populations and the competing hypotheses, at the most basic level, are these two proportions are equal for the null and that they're not equal for the alternative, but we can express these in terms of any of our measures of association. I've included these on the long scale knowing that we could express this in terms of the log relative risks and the log odds ratio is being 0 or not, that doesn't really add anything to the other three ways of expression. But just throwing it in there since we looked at things on the long scale, but that was really so that we can compute the confidence intervals. So, as such, since we could express these hypotheses in terms, equivalently in terms of any of our measures of association, only one p-value and hence one hypothesis test is needed for all measures of association comparing binary outcomes between two populations. The test is performed using the observed difference in proportions or observed risk difference and its estimated standard errors. Again, we start by assuming the null, measure how far our observed difference is from zero in terms of standard errors, convert that to a p-value and make a decision. Are you picking up on a pattern here?