Let's look now at how to calculate odds ratios and logistic models when you're using the R survey package. So the odds of having a characteristic are defined this way. The probability that you get the characteristic divided by the probability that you don't. And the ratio of the odds for having a characteristic when you're comparing two categories is defined as the ratio of the odds. Shorthand for that is odds ratio. And take an example. Suppose we were trying to estimate the probability of having diabetes, say, and we are comparing males and females. Then, what the odds ratio would be, would be in the top we'd have the odds of having diabetes if you're male. In the bottom, we'd have the odds of having diabetes if you're female. Now in the logistic regression model we set up this sort of a model, where we're actually modelling the log of the odds. The math works out nicely for that formula, the reason that people do it. So the log of the odds for a given set of covariates. That's what these xi's are, is modeled as linear combination of those covariates. So on the log odds scale, we get the same sort of model we get in linear regression. And the x's here can be all sorts of things. They can be quantitative variables like, how much education you've got. Did you go to college or not? Categorical variables like male or female, or what region of the country you live in. Whether you've had a adequate diet throughout your life. All sorts of things are possibilities. So we estimate that there's software out there that will do it for us, so we don't have to worry about that piece. So now let's think about the log of the odds for category one versus two, the log of the odds ratio. If I take the logarithm of that formula on the last page, then I get the difference in the log of the odds. And if my x variable, the covariate I'm thinking about is categorical 0 or 1 like male versus female. And I set all the other covariets in my model to be the same for males versus females, assuming that makes substantive sense. Then the model says that the difference in those log odds is just going to be the difference in the regression parameters. Say beta 1 minus beta 0 for category 1 versus category 0. If the category 0 is a reference category, in other words, the category that we set the solution to the regression parameter estimation to zero. Then this one zeros out, the beta naught. And we're just left with the difference in the log of the odds being beta 1. Well fold this back up so it's not the difference in the odds, it's the odds ratio. That means to get to the odds ratio scale all I have to do is exponentiate. So if I exponentiate my solution for beta 1, beta hat 1, then that's the odds ratio for having the characteristic, like diabetes, for males. Versus the odds of having the characteristic if you're female. So that's how logistic regression works. So let's take a look at the R survey code that would allow you to do that. Now, the first thing I do, is require the package as usual. I tell it what data I want to use. So I'm using the academic performance index data set again. And I define the same stratified simple random sampling design that we've looked at before. Now in this case, I'm going to fit a model, logistics model that's the reduced version of the one that we saw in the last video. There were a couple extra parameters in that one. That appeared to not have any importance in the model. So I've left those out. So I'm predicting whether a school wide improvement target has been met or not. Based on mobility, the enrollment in the school, percent of parents that are college grads and a factor for whether the school was year round or not. Remember that this family equals quasibinomial link gives me a logistic regression. And here's my output then here. So these estimates are a little bit different than the last video because I've left out those parameters that didn't look important. So enrollment's a big one. In the intercept mobility is significant at the 10% level. College grad is significant at the 5%. Year round is not significant. But just for illustration let's think about what would I do to get from here to the odds ratio for year-round schools, versus not-year-round schools? So as the last slide showed, I just exponentiate that parameter value. So I do that at the R prompt. And here's my answer 3.76 approximately. That says that year-round schools are about 3.76 times more likely to meet the school-wide growth target than are schools that are not year-round. But note this P value here. The evidence may not be that strong because our test is saying that, well, maybe this estimate is not that much different than 1 or 0. So let's follow up on that by looking at the confidence intervals. How do we do that? We have a function in R called confint. So I save the model itself in an object called M3. So I just apply confint to M3 and I save that in an object called CI. So here's all the confidence intervals, it does them all at once. And the one that we're interested in here is the year round parameter estimate. So you can see that this covers 0, the confidence interval, and this is 95%. I go from -0.45 up to 3.10. And then I back transform to the odds ratio scale. And sure enough, the confidence interval covers 1. Which would say that year-round schools and non year-round schools are about equally likely to meet the target for growth. On the other hand, things are pretty suggestive, this upper end point is quite aways from o1ne. This 0.63 is somewhat close to 1. So with more data maybe we would be able to say more confidently that year round schools are more likely to have met the target. But that's how you setup the code and go from the parameter estimates to odds ratios using the R survey package.