Let's look now at how to calculate odds ratios in logistic models when you're using the R survey package. The odds of having a characteristic are defined this way, the probability that you get the characteristic divided by the probability that you don't. The ratio of the odds for having a characteristic when you're comparing two categories is defined as the ratio of the odds, shorthand for their odds ratio. Take an example, suppose we're trying to estimate the probability of having diabetes, say, and we're comparing males and females, then what the odds ratio would be, in top we'd have the odds of having diabetes, if you're male, in the bottom we'd have the odds of having diabetes if you're female. In the logistic regression model, we set up this model. We're actually modeling the log of the odds. The math works out nicely for that formulation. Is the reason that people do it. The log of the odds for a given set of covariates, that's what these x_i are, is modeled as a linear combination of those covariates. On the log odds scale, we get the same model, we get the linear regression. The x_i here can be all sort of things. They can be quantitative variables, like how much education you've got? Did you go to college or not? Categorical variables like male or female, or what region of the country you live in? Whether you've had adequate diet throughout your life, all things with possibilities. We estimate that there's software out there that will do it for us. We don't have to worry about that piece. Now let's think about the log of the odds for category 1 versus 2, log of the odds ratio. If I take the logarithm of that formula on the last page, then I get the difference in the log of the odds. If my x variable, the COVID I'm thinking about is categorical zero or one like male versus female. I said all the other covariates in my model to be the same for males versus females, assuming that makes substantive sense. Then the model says that the difference in those log odds is just going to be the difference in the regression parameters. Beta one minus beta naught for category 1 versus category 0. If the category 0 is a reference category, in other words, the category that we set the solution to the regression parameter estimation to zero, then this one zeroes at the B_0 and we're just left with the difference in the log of the odds being B_1. Well, fold this back up, it's not the difference in the odds, it's the odds ratio. That means to get to the odds ratio scale, all I have to do is exponentiate. If I exponentiate my solution for Beta hat 1, then that's the odds ratio for having the characteristic, like diabetes for males versus the odds of having the characteristic if you're female. That's how logistic regression works. Let's take a look at the R survey code that would allow you to do that. The first thing I do is require the package as usual. I tell what data I want to use. I'm using the academic performance index data set again, and I define the same stratified, simple random sampling design that we've looked at before. In this case, I'm going to fit a logistic model. That's the reduced version of the one that we saw in the last video. There were a couple of extra parameters in that one that appeared to not have any importance in the model, so I've left those out. I'm predicting whether school wide improvement target has been met or not, based on mobility, the enrollment, the school percent of parents that are college grads and a factor for whether the school is year-round or not. Remember that this family equals quasibinomial link equals logit, gives me a logistic regression. Here's my output down here. These estimates are a little bit different than the last video because I've left out those parameters that didn't look important. Enrollment's a big one. The intercept mobility is significant at the 10 percent level, college grad is significant at the 5 percent, year-round is not significant. But just for illustration, let's think about what would I do to get from here to the odds ratio for year-round schools versus not year-round schools. Is the last slide show. I just exponentiated the parameter value. I do that at the R prompt. Here's my answer, 3.76 approximately. That says the year-round schools are about 3.76 times more likely to meet the school wide growth target than are schools that are not year-round. But note this p-value here. The evidence may not be that strong because our test is saying that, well, maybe this estimate is not that much different than one or zero. Let's follow up on that by looking at the confidence intervals. How do we do that? We have a function in R called confint. I saved the model itself in an object called the m3. I just apply confint(m3) and I save that in an object called CI. Here's all the confidence intervals. It does all at once. The one that we're interested in here is the year-round parameter estimate. You can see that this covers 0, the confidence interval, and this is 95 percent. I go from minus 0.45 up to 3.10. Then I back transform to the odds ratios scale. Sure enough the confidence interval covers one which would say that, year-round schools and none year-round schools are about equally likely to meet the target for growth. On the other hand, things are pretty suggestive. This upper endpoint is quite a way from one. This 0.63 somehow close to one. With more data, maybe we would be able to say more confidently that year-round schools are more likely to meet the target. But that's how you set up the code and go from the parameter estimates to address ratios using the R survey package.