Okay. So, welcome back to our discussion of fitting multilevel statistical models to dependent data. In this lecture we're going to be focusing on multilevel models for binary dependent variables. So, we'll be talking about multilevel logistic regression models. So let's think about the way we write these kinds of models, when we have binary dependent variables. Last week we learned how to write the model for a binary dependent variable, using this logit link. This natural log of the ads, that the dependent variable is equal to one. So we see that we write the natural log, of the probability that a binary dependent variable Y, measured on person i within cluster j is equal to one, divided by one minus that probability, we call that the logit function recall. So, we write this is the logit of the probability that that dependent variable Y is equal to one, and in the multilevel specification we again have this combination of fixed effects, these are the fixed unknown constant parameters that we want to estimate to describe the relationships, of the predictors with the log adds, of the dependent variable being equal to one. Then we also have these random effects, that are capturing these dependencies within the same higher level cluster, in this case denoted by j. So, this is an example of a random coefficient model where we have the random effect, use your j, which allows each cluster to have a unique intercept in the logistic model. Then we have the random effect u one j, which allows each cluster j to have a unique relationship of x, with the log adds, that the dependent variables equal to one. We could also rewrite this using the multilevel specification if we so desired, this is just the single level equation incorporating the random effects. So, we make the same distributional assumptions that we did in multilevel linear regression models about these random cluster effects. We assume that they're normally distributed, that they have a mean vector of zero, that means the mean of each random effect is zero, and the random effects have unique variances, and covariances. So, same distributional assumptions as we would have made in the multilevel linear regression model. Recall, we're fitting a multilevel model, because we have explicit interest in estimating the variance of the random cluster effects. So, part of our research question involves estimating the amount of between cluster variance, in the dependent variable of interest in this case in the log adds that the dependent variable is equal to one. Remember that has to be an explicit part of our research question. When we fit these kinds of generalized linear regression models to non-normal outcomes and we include random effects, estimation becomes a lot more difficult mathematically. We're not going to dive into the math in this particular lecture, but it's a much more difficult computational problem, to estimate these models from non-normal dependent variables. So again, the clear motivation for fitting these multilevel models that explicit interests, in estimating the variance of random cluster effects becomes very important, because it does take longer to fit these models computationally. So, in estimating the model parameters, we fit multilevel models to non-normal outcomes, and in these cases it's difficult to write down the likelihood function. We talked a little bit about the likelihood function previously when introducing these kinds of models. In the case of non-normal outcomes becomes much more difficult to write down that likelihood and in some cases we may not even be able to write it down. There may not be what's called a closed form solution for that likelihood function. So, what we have to do in practice in many cases is first of all approximate that likelihood function. So, we use mathematical methods to come up with an approximation, of what the likelihood of the observed data would be under the given model specification. Then, once we approximate that likelihood function, we find the estimates of the parameters, the fixed effect parameters, the variances, the covariances that maximize that approximate likelihood. So that's a key aspect of fitting multilevel models to non-normal dependent variables like binary variables. Oftentimes we can't write down the likelihood explicitly, and part of the computational process involves first approximating the likelihood, and then maximizing it, finding the perimeter estimates that maximize it. So long story short, it just takes longer to fit these models from a computational perspective. One possible approach to this process is called Adaptive Gaussian Quadrature. This is just an estimation method that involves approximating the likelihood, and then maximize net likelihood. We've included a deep dive reading for this week by Kim and colleagues, where they perform some simulation studies, to evaluate alternative estimation approaches for these types of multilevel models, and they found that Adaptive Gaussian Quadrature generally works well, in a variety of scenarios especially for smaller samples. What about testing the model parameters? So, using the methods that we've talked about in a previous course, in previous weeks, we can again compute confidence intervals, or test hypotheses for the parameters that we're interested in estimating. We would test null hypotheses about the parameters of interest, that is a fixed effect is zero, or variance component is zero meaning that the random effects don't vary, we can test these null hypotheses again using Likelihood Ratio testing. So, we would use the same likelihood ratio testing approach for multilevel logistic models, that we discussed for multilevel linear models. Assuming that we have large enough samples of clusters, and observations per cluster. So again, there's a reading this week that provide specific details, and how to perform these types of likelihood ratio tests for the parameters in multilevel models, and that'll be part of our materials for this week. So, let's revisit that NHANES example, where we introduced logistic regression, and recall that we fitted a logistic regression model to model the probability of ever smoking 100 cigarettes in your lifetime, as a function of selected predictor variables. So if you think back to week two, we talked about fitting this kind of logistic model to the smoking data within NHANES. In that analysis, in week two we assumed that all NHANES observations were independent of each other. In reality this is not true, because of the study design that was used for the NHANES. In the NHANES multistage probability sampling was used, where there were several stages of random selection of sampling clusters, or geographic areas more generally. So the observations, on this indicator of ever smoking 100 cigarettes in your lifetime, they come from these randomly sampled clusters as a part of the NHANES sample design. Because we have many people nested within the same cluster in that sample design, their observations on this indicator may in fact be correlated with each other for one reason or another. So, we can't make the assumption that all NHANES observations are truly independent of each other, when fitting models to the NHANES data. If in fact, the smoking observations are correlated within areas, the standard errors in the kind of naive logistic regression analysis that we performed in week two are likely understated. So, what does that mean? That means our estimates, the regression parameters describing the relationships of these predictors, with the probability of ever smoking 100 cigarettes, or the mean of that binary dependent variable, these estimated coefficients will have standard errors that are too small. So, we're basically saying that the sampling variability is smaller for those estimates than it really should be. Why should it be larger? Because those observations on the dependent variable, are correlated within areas, and that increases the sampling variance of our estimates. So, we need to make sure that our model accounts for that aspect of the study design, and including random cluster effects is one possible way to account for that fact, that observations are correlated within the areas. That's generally going to increase the standard errors of our estimates, accurately reflecting the study design. In addition to the modeling aspect of this, we again, we want to make sure that we're accounting for that between cluster variability or in other words that within cluster correlation, in the values of the binary dependent variable, we may also have explicit interest in estimating the variance between the NHANES sampling clusters in terms of the probability of smoking. So, it's a combination of making sure that we get the model right and that our standard errors accurately reflect the sample design, but we also have this ability to make inference about the variance between sampling clusters in terms of the probability of smoking, as a part of the multilevel modeling approach. So, here's a graph where we can visualize the amount of variability between the NHANES sampling clusters in terms of this probability of ever smoking a 100 cigarettes. So, each of the bars in this graph corresponds to one of the unique NHANES sampling clusters reflecting the complex sample design that was used and the size of each bar represents the proportion of individuals in that cluster who have ever smoked a 100 cigarettes. So you can see that these bars bounce around a whole lot across the different sampling clusters. That's a visualization of that between cluster variance in the mean of the dependent variable of interest that we're worried about estimating when we decide to fit multilevel models. So again, we may have explicit interest in estimating the amount of this between cluster variance and we use random cluster effects to capture that variance. So, let's think about fitting our multilevel logistic model. In the model that we're going to consider, we include random effects of those randomly sampled NHANES clusters. What this means is that the intercepts in the model that we're fitting are allowed to randomly vary across those sampling clusters. For this example, we're not considering the case of random slope. So, the coefficients for all of our predictor variables, we're going to assume that those are constant across the sampling clusters. We're only interested in estimating variability in the intercepts, allowing each cluster to have a different proportion in expectation. So, when fitting this model to the NHANES data, what we end up with is very similar inferences regarding which predictors are significant compared to week two when we weren't accounting for the random effects. We see slight changes in the estimated fixed effects, but for the most part we see the same coefficients in terms of the relationships of these predictors with the probability of ever smoking a 100 cigarettes. But one key difference is that the standard errors of the estimated fixed effects are now larger because again the sampling variance is reflecting that between cluster variability that's being captured by the random effects. In addition, the estimated variance of the random cluster intercepts was 0.046. Now, that doesn't seem very large, but to truly evaluate that variance component, we have to perform a likelihood ratio test in order to make inference. So when performing the likelihood ratio tests, we find that we would reject the null hypothesis that the variance of those random cluster intercepts is zero. We definitely have strong evidence that there is between cluster variability in those intercepts after including all of the various covariates or predictor variables in this particular model. So, it seems like including those random cluster effects is an important contribution in terms of our model fit. So, even after adjusting for all those other predictors of smoking, the randomly sampled clusters still vary in terms of their smoking prevalence, and again, we're capturing this via those random effects. So, let's think about model diagnostics. We see that including random cluster effects in the logistic regression model improve the fit of the model based on the likelihood ratio test, that's a good thing. But let's look at the distribution of the predicted values of these random effects or the EBLUPs. Are there potential outliers? Remember, there are no residuals to worry about in the simple logistic regression model, the variance of the dependent variable is defined by the mean like we discussed in week two. Another key consideration is that we might center continuous predictor variables so that the intercept is interpretable. So, depending on whether our variables are continuous, we might center those variables at the mean so that we can interpret the intercept as representing an expectation of the log odds, of the probability of smoking when the predictor variables are set to their means. So, here's a graphic of the predicted random effects for the EBLUPs for the random intercepts in this particular logistic regression model. This normal QQ plot suggests that the random effects on the intercept due to the NHANES sampling clusters are normally distributed. You see that all the points lie on that 45 degree line. In addition, we don't see any evidence of outliers like we saw when analyzing the European social survey data. We saw that some interviewers were outliers. In this particular graphic, we see that everybody tends to follow the same distribution and there are no very unusual outliers in terms of the randomly sampled NHANES clusters. So what conclusions can we draw from this example? Compared to the model where we did not account for the random cluster effects, we find the same predictors of smoking. So the same predictors of smoking that we found in week two are still important. But given the significant unexplained variance in these random cluster effects that we included in the multilevel model, we might take a next step of trying to explain variance by including fixed effects of cluster level predictors. So for example, we might include an indicator of socioeconomic status of a given NHANES sampling cluster in an effort to try and explain that variability in the random intercepts. However, when we compare variance components between multilevel models that have different cluster level fixed effects, so, we're trying to explain that variance, both of the models fitted must include the same respondent-level fixed effects. So we can't change the predictors that are being measured at the respondent level when trying to evaluate these changes in variance components. We have to make sure that both models are fitted using the same cases, the exact same number of observations, and they include the same respondent-level fixed effects. For a deeper dive reading on this issue, in terms of the magnitude of the variance component, and why we need to keep this set of predictors at level one fixed, we would refer you to the textbook multilevel analysis: techniques and applications by Joop Hox and colleagues. This is now in its 3rd edition, and section 6.5 in that textbook explicitly talks about this issue of comparing variants components between different multilevel models. So, for all practical purposes, we just want to make sure that the level one predictor stay the same when comparing these variance components. So, what's next? Now that we've talked about multilevel modeling and different approaches to using multilevel models to account for this kind of dependency based on the study design, we're going to look at a full example of fitting multilevel models to longitudinal data with Python and making inference based on those fitted models. We're also going to do an exploration of a web application that allows us to visualize the fits of these kinds of models. Then we're going to turn our focus to marginal models for dependent data and alternatives for modeling clustered and longitudinal datasets that don't rely on these random effects of higher level clusters.