In this section we will talk about the basics of model estimation for multiple logistic regression, and handling the uncertainty and the resulting estimates. Both by creating confidence intervals for individual odds ratios, adjusted odds ratios but also for looking at the idea behind what is being tested when we're testing a categorical predictor where we require multiple x's to model it. So after viewing this section, you will be able to conceptually extend the concept of maximum likelihood estimation to a multiple logistic regression models. Compute 95 percent confidence intervals for the intercept and individual slopes, and then exponentiate these results to put these on the odds and odds ratios scales. Understand how to perform a hypothesis test for individual slopes, and understand the concept of the likelihood ratio tests that allows for testing multiple slopes at once. It's useful for testing and multicategorical predictors. So the general approach to estimating the intercepts and slopes for logistic regression models both simple and now multiple is called maximum likelihood. Just like we saw with definitionally for simple linear logistic regression the same idea applies here and it's a complicated idea, and it's complicated mathematically. But the estimates for the intercept for a given model and slopes for the x's in a given model are the values that make the observed data. The data used to fit the model and the outcomes modeled by the model most likely among all possible choices for beta naught hat, beta one hat up through beta hat p for the multiple slopes we have. So this is a complicated idea and it's computationally intense so it must be done with the computer. But again there is an algorithm at play here such that if you were to use the same data to estimate a multiple regression model, the same model on different platforms software's you would get the same results across all of them. So the maximum likelihood algorithm also give standard error estimates for the intercept and slope. The standard errors allow for the computation 95 percent confidence intervals and p-Values for the slopes and intercept. The random sampling behavior of these estimates is normal in large samples. In other words if we were to repeat a study over and over again and take representative or random samples of the same size from the same population and estimate the same multiple logistic regression model on all these random samples, of course there'll be variation or estimated intercept and slopes across these moles because they'd be based on different subsets from the population of interest. But if we were to look at the individual behavior of any one of these quantities and plot a histogram of it across the samples, it will be relatively normal in larger samples. Think about this. Our slopes are log of odds ratios which we saw back in term one had any normal sampling distribution. So this is in sync with what we'd expect. This is in larger samples we won't worry about the cutoff for larger versus smaller samples but there is an exact algorithm that can be used in smaller samples and the computer will use that on a case by case basis if needed. It will be a smooth operation for us to use a computer get any results from multiple logistic regression and regardless of how those confidence intervals are computed the interpretation is the same. Here, so again it's business as usual for getting 95 percent confidence intervals, take our estimate as track two standard errors the only caveat is because these are on the log scale as we saw with simple logistic regression we we'll exponentiate the results to present the confidence interval and the odds are odds ratio scale. So this means we can get 95 percent confidence intervals for both the intercept and slope simply by taking our estimate plus or minus two, estimated standard errors. Again in many cases the intercept on its own isn't necessarily a scientifically interesting quantity unless there is a group data where all x's are equal to zero. So if all of our predictors are binary or categorical this may be a useful quantity, but regardless it's easy to compute a 95 percent confidence interval for the slope by taking our estimated slope adding and subtracting two estimated standard errors which will be given from the maximum likelihood estimation algorithm. If we want to get the confidence interval for the slope beta i, where i is one to p, generically where we have p x's in the model, we just take our estimated slope beta hat of i and subtract two estimated standard errors. What we would do in neither case before fully presenting the results to any reader or in any journal article is we'd exponentiate the end points to get the confidence interval either for the odds, for the reference group. If we're exponentiating the results for the intercept or the odds ratios, adjusted odds ratios in our model if we're exponentiating the confidence intervals for our slopes. We would to get a p-Value for testing any of the comparisons made by the slopes are statistically significant. The generic approach to getting a p-Value for the slope beta i for all of our slopes in the model this will test whether the particular x associated with the beta x i is a statistically significant predictors of our binary outcome y after accounting for the other xs in the model. Generically the null hypothesis is the true population level slope is zero versus that it's not zero. We could also express this in terms of the exponentiated slopes the null is that the odds ratio the exponentiated log odds ratio is equal to one versus that it's not equal to one. In order to do this what we do is we assume the null is true and calculate the distance of our slope from what it's expected to be under the null hypothesis zero, but we do this in units of standard error. We measure how far it is from zero in standard errors and declared it to be far, not so far by translating that into a p-Value which estimates the proportion of results we could have gotten that are as far or farther than what we did just by chance if the null as the population level is the truth. So let's look at predictors of obesity like we did before when I first showed these. I presented the odds ratios and confidence intervals for them both the unadjusted and adjusted. We know how to get the unadjusted odds ratio confidence intervals, well guess what, it's exactly the same approach for the adjusted. So let's focus here on the results from model two for a moment and focus on the confidence intervals for the odds ratios associated with sex in HDL in that model. So here's the model written out on the log scale, the regression scale. Each of these is the, here's the intercept, the slope for sex, the slope for HDL and the slopes for age quartiles two through four respectively. The standard error of the slope estimate for sex is 0.06, the standard error of the slope estimate for HDL is 0.002, this all came from the computer again. So now if I wanted to get a 95 percent confidence interval for the adjusted association between obesity and sex, well I could look at that slope for sex, to get the confidence interval on the slope scale I'd take my estimated slope 0.78 plus or minus two standard errors. Gives me a confidence interval for the log odds ratio of obesity for females compared to males adjusted for HDL and age of 0.66 to 0.90. So this does not include the null value of zero for slopes but of course we'd prefer to present this on the actual adjusted odds ratio scale. The estimated adjusted odds ratio of obesity for females to males adjusted for HDL and age is e to the 0.78 e raised to that estimated slope which is 2.18 and the 95 percent confidence interval on the odds ratio scale we just get by exponentiating the endpoints on the slope scale 1.93 to 2.46. So this is adjusted odds ratio, is statistically significant again. The slope did not include the confidence interval for the slope. The log-odds ratio did not include zero and the confidence interval for the ratio itself when the exponentiated slope did not include one. The same process if we wanted to do the slope for HDL, we have the estimated slope we add subtract two standard errors. We get this confidence interval on the log odds ratio scale that goes from negative 0.048 and negative 0.040, does not include the null value for log odds ratios of zero. When we exponentiate the estimated log odds ratio of negative 0.044 we get an adjusted odds ratio of obesity for two groups who differ by one milligram per deciliter in HDL. But are of the same sex and age cut 0.957 to get the confidence interval on the odds ratio scale we just exponentiate our endpoints from confidence interval for the slope 0.953 to 0.961. So, the confidence interval for the slope or the log odds ratio did not include the null value on that scale of zero, and hence the confidence interval for the ratio itself did not include the null value and that scale of one. How will we get a p-Value for any one of these slopes? Well, we'll just use sex in this example we could replicate this for any of the slopes in this model or other models as well, the processes is always the same. We're testing the null hypothesis that our slope is zero versus the alternative that it's not. Again as I said before, the null hypothesis that the odds ratio at the population level the adjusted odds ratio is one versus that it's not one. We can do this again for any of our slopes and hence adjusted odds ratios we take the estimated slope divided by its estimated standard error to figure out how far it is from the null value of zero in terms of standard errors. We have a result for this example that is 12.4 standard errors above what we'd expect we know the sampling behavior of these slopes from study to study as roughly normal. We're assuming that the truth is 04R hypothesis test and we have something that's way out here more than 12.4 or 12.4 standard errors above zero. So, p-value is the proportion of results we get that are more than 12.4 standard errors away from zero, and it's a very small percentage of observations under that sampling curve, result in p-value is very small well less than 0.01. Again, we also purported p-values both in the unadjusted, and now for the adjusted comparison testing whether the overall construct of age, the overall predictor of age, which is modeled by 3x's because there's four categories when we put it into quartiles. Whether that's statistically significant predictor and when we're testing that in this multiple regression model, whether that is as statistically significant predictors of obesity above and beyond or after accounting for sex in HDL. Process to do this conceptually is exactly the same as with multiple linear regression, just the name of the test is different. So, as with multiple linear regression, in multiple logistic regression, when our predictor is multicategorical and, hence, is modeled with multiple xs: in order to test whether the participator is statistically significantly associated with the outcome, it is not necessarily enough to test each slope individually or look at the p-values for each slope on their own. So again, this regression model we had here that included sex, so we have sex, in HDL each of which we only require one x, so we could just look at the confidence interval and our p-value for that one slope. But age was categorical, it was four quartiles and there were three xs. So, in order to formally test whether age is as statistically significant predictor of obesity above and beyond sex and age, we need to test these three slopes for the three indicators, for the non-reference categories at once. So, not just anyone individually. So, why do we need to do this? Well let's again think about this. If we test them on their own, Beta three is simply the difference between age quartile two and age quartile one. Beta four is the difference between age quartile three and age quartile one, Beta five is the difference between age quartile four. So, these are three specific single differences, if we test any of them on their own, we're only testing that specific comparison. Even if none of these were statistically significant, and this is why we need this overall test, certainly, in this example some of these are statistically significant so we know the answer already, but we can have situations where none of these differences in the coding scheme that we've set up for the xs are different, but we're missing some of the comparisons. We don't get an explicit comparison of Q3 to Q2, or Q4 to Q2, or Q4 to Q3 based on the way we've coded the reference group and these categories. These could be estimated by taking differences in the slopes and so this tests, they are taken together all three of these are zero. And if that's the case, that also implies that all combinations, all differences of these are zero. So, it covers ultimately a test that there are no differences in the log odds, and that's gone differences correctly, between any two categories here. So, it covers those that are not explicitly modeled by our xs as well. The test here is it's not called an F test, it's called likelihood ratio tests, but it's exactly conceptually the same thing as the partial F test for linear regression. It compares the amount of information in y explained by sex, HDL, and age to the amount of information in y explained by sex and HDL. In logistic regression, we can't quantify the percentage of variability explained in the outcome because there's not a consistent easily interpret measure variability for binary outcomes especially when transformed the log odd scale. So, I'm using the word information here because it's not technically variability. But nevertheless, this test compares the following two models in our example, we have the model that has all three predictors sex, HDL, and age to a model that only includes sex and age, and it's testing whether the extended model is statistically significantly different than the null model. If not, that means that there's no improvement in our understanding about obesity, when we add in these extra predictors to model age. So, if the extended model adds enough additional information about our outcome above and beyond that explained by the null model, it has enough information to justify estimating three extra slopes with the same total amount of data, then this null is rejected. Otherwise, we'd fail to reject the null and the null model without these extra predictor is preferred. This of course like everything else needs to be done with a computer. I just want to give you a heads up on the idea behind it and just like the partial F test for multiple linear regression, this approach is generalizable to any two null and extended model setups. This is a generic setup where the extended model includes everything that is in the null model plus additional predictors. So, these models are considered to be nested, the null is nested. The null is nested within the extended because, the null is nested within the extended because the extended includes everything in the null, everything in the null plus extra predictors. These don't have to be multicategorical per se, but this is certainly a really nice utility of this test for testing individual predictors that require more than one x to model them. So, the construction of confidence intervals for multiple logistic regression slopes and intercepts is business as usual, and take the estimate and adds or subtract two estimated standard errors for large samples. Again, generally, a computer will handle this, and I won't give you any exercises to do by hand that were not large samples. But in smaller samples, the 95 percent confidence intervals in p-values are obtained by exact computer-based methods. But regardless of the approach to getting these, but the interpretation is the same. Confidence intervals for slopes are confidence intervals for adjusted log odds ratios. The results can be exponentiated to get 95 percent confidence interval for adjusted odds ratios. Confidence intervals for intercepts are confidence intervals for the log odds of the outcome for a specific group not always relevant to when at least some of the xs are continuous, but these results for the confidence interval and the intercept can be exponentiated to get the 95 percent confidence interval for this starting or reference odds. Formally testing multiple categorical predictors requires testing two or more slopes and, hence, two or more adjusted odds ratios together as opposed to individually. This can be done using a likelihood ratio test, and I'm just making you aware of the name of the test and the idea behind it as you may see it referenced in journal articles. The resulting p-value tells us whether the multicategorical predictor is a statistically significant predictor of the log odds so that y equals one or of y in general, after accounting for the other predictors in the model.