We close by taking up a seemingly ingenious idea of it to the best of my knowledge originated in econometrics literature, and which is found and continues to find widespread use in the social sciences for the analysis of observational studies. So far, we have assumed that treatment assignment is unconfounded given measured covariates. Now, as we know in many instances this assumption is not reasonable. Now, economists argued that if measurements were taken on multiple units in a cluster and all these unit shared a common unobserved variable U, then U, even though it is not measured, could be taken into account using standard models such as regression. For example, let's have panel data or repeated measures data. Statisticians might call it, longitudinal datas, statisticians might also call it. Individuals are observed at multiple points in time. So, here each individual is a cluster and new measures unobserved variables that are unobserved in constant over time. To fix ideas is one example. Economists have used such data to study the effect of marriage on men's earnings. They've been doing this for more than 50 years. The question here is, whether more productive men are more likely to be married or whether marriage increases productivity? Now here, measured covariates have included such things as education, family background, occupation, industry, union status, etc. But productivity itself is not measured. Suppose then U represents unmeasured aspects of productivity and U is constant within units. Now, let's also suppose that earnings or some transformation of earnings such as the natural log. Let's call that yt, and we have unit i equals one through n. That's the cluster, and the cluster is observed in period t equals one through capital T, it has to be more than two, greater than at least two. Let's say that that depends on time varying regressors. Which will denote X_it. Marital status is Z_it in period t, one if you're currently married, zero otherwise, and Ui, which you'll notice is constant over time, there's no t on it, and random errors Epsilon it, your usual regression types of errors. Now, to keep matter simple, and to narrow the empirical literature or just want to consider perhaps the simplest case linear regression, etc. So, let's let yt have this linear regression form, and Z_it with a bar over it, will be the set of treatment allocations or its treatment assignments rather from time one through time T little t, and we're going to make one of two assumptions. First of all, either the Epsilon sub it, the regression error, given the complete history of the covariates X, the complete treatment assignment Z, and be unobservable variable zero. Now, this is an assumption that Chamberlain who's written about this stuff called strict exogeneity, and then there's a weaker assumption called sequential exogeneity. This just requires that the Epsilon it up through time little t. The History of the covariates up to your time little t, and the treatment assignment up through time little t and U are zero. The strict exogeneity assumption is unreasonable, you can see it for yourself. If X_it where to contain lag values of y, which you could. In this case, you bet it better off assuming sequential exogeneity because the other is just unreasonable. Now, there's couple of ways we could treat Ui. If Ui is treated as a mean zero random variable, that's independent of X_it and Z_it. Then we obtain a random effects model, but the random effects model is often unreasonable. For example, in the marriage example, it would not be reasonable to assume that unmeasured aspects of productivity are independent of education and other covariates. So, for that reason, economists often prefer to treat Ui is a so-called "fixed effects" and I've cited Mundlak who had the first insight into this. In this case, by including dummy variables for each cluster, Ui can be treated as a parameter to be estimated up above. Well, some line identification conditions, but nothing substantial or it can be difference out in several ways. You can difference out from Y_1, and you see that when you difference it out there's no Ui, or if you took the means that of Y_1 above you just took the mean y bar over all the time points within the i cluster. You'll see again, when you do the arithmetic that DY pops out. Now, if you have strict exogeneity consistent estimates of Beta, and Tau, the so-called "treatment effect". We're going to call it a treatment effect for now. It's not really treatment effect. We'll see, but the purported treatment effect. Anyways, you can get these consistent estimates using ordinary least squares, and typically this purported treatment effect has this very simple form of Tau times Z_it. So, Z_it is one, you're adding Tau, and if Z_it is zero, nothing. The estimate Tau had is simply in this case interpreted as an effect, and the exact nature of which is not stated. Not surprising because one starts with the model and never says what the estimand is, we'll see more on that later. Estimation is more complicated under sequential exogeneity, and I won't be discussing that here. You can see the book by Wooldridge, for greater coverage of this and other topics for fixed effect panel data models. To discuss that would be a real distraction, it's more complicated, and the point I want to make here is more about identification, less about estimation. So now, throughout this course we've emphasized the importance of using potential outcomes to define unit and average effects. Then stating conditions under which these are identified, then and only then comes estimation. So, in the approach above, the regression model is simply put out there, and this parameter Tau, which is a byproduct of the model, is simply interpreted as the treatment effect. As I said already, this approach is problematic. Following paper I wrote, we're going to reformulate this problem within the framework of longitudinal causal inference, which we studied in module 10 using potential outcomes. So, in each period T, we're going to assume that time varying covariates are first observed, then treatment, then the response. An important point in great consequence not considered at all in the prior approach, you need a regression approach we just talked about, is the dependence of the time varying covariates on prior treatment assignments. So, I'm going to need to set up and define some time varying covariates, and you'll notice that the time varying covariates can depend upon the prior treatment assignments. Not just the potential responses the y, t, z, t, but the time varying covariates as well. This is important point. We're going to see why in a moment. Analogously with regression model previously considered, which I've written up again, let's look at a so-called causal model. You'll see that the causal model looks very much like the regression model, except that now the dependence of the covariates on the prior treatment assignment is made explicit, and second, the parameters appear with the c to denote that this is a causal model or model for potential outcomes or what Robins might've called the structural model. So, for i equals one to n and for all the time points t up to t, I'm going to make assumptions that are analogous to the assumptions we made about strict exogeneity and sequential exogeneity so that we can study when these might hold and when the causal model might give the same results as the regression model. That's the point after all. Does a regression model work and if so how? We write the parameters with the c to distinguish these from a regression model. Okay, and in general, we know that the two sets of parameters are not going to be identical. So, we're going to have assumptions that are analogous to strict exogeneity and sequential exogeneity. So, you'll notice that in the first, we condition on the complete set of covariates and in the second, we condition on a set of covariates up to time little t for the year. So, in the first, little t is less than big t in general. But in the second, they are the same. Now, under either assumption, the expectation comparing treatment sequence little t with z star little t. Remember that's what we do and we do longitudinal causal inference or is it things we compare. More, we compare within covariate classes. At least, that's one of the things we do. Anyways, we can write that as follows. So, you'll see that what happens is we have this stuff about the treatment assignment Z's back there, that last two terms, and we have the business about the time varying covariates. Okay, so in general, those covariates are going to be different under those two sub regimens, Zt and Zt star, and so the comparison between the different treatments of regimens depends not only on these torr terms, but also on the effects of the different treatment histories on the time varying confounders, which seems to be completely lost in this literature. Now, to estimate the effective little zt versus zt star, no matter what procedure is used to estimate the parameters, then we see that it's necessary to model the effect of the treatment history on the time varying confounders if this guy up there isn't zero, which we probably don't think it would be. So, now if we look at that last term, that torr c's with the Zt and Zt star, two different assignments, that guy is the effect of the two different sub regimens comparison of these, only if the other guy is zero. So, how would that happen? Well, one thing we might assume is the treatment doesn't affect the time varying confounders. Well, that assumption is pretty unreasonable in most cases of interest. Okay, so we don't want to assume that, that be a little crazy. So, thus we might focus on those assignments where the covariates are the same under the two things, but we need to focus on such assignments. But when might that occur? That might occur when the two assignments are the same, and then, at time t, you just comparing treatment versus Zt star is zero. Okay, so then the time varying covariates would be the same up to period t minus one under the two different treatments sub regimens, and we might make torr c of Zt bar equal to just torr c times Zt. So, then the effect would be torr c, if you got, would be torr c, right? So, that would allow you to compare a whole set of treatment assignments or treatments sub regiments if that model were true. All treatments of sub regimens that agree up to time t minus one. Now, not sure, but presumably, that might be what empirical researchers are attempting to estimate when they take torr of the observed treatment assignment in their regression model to be torr times Z_it. OKay, so I've given conditions under which consistent estimates of the torr c parameters can be obtained using the regression model. Of course, we're going not even try to deal with the time varying covariates because that just adds additional layers of complexity on top of this all, you'd have to model them et cetera. So, we're going to just take the case where the time varying covariates, whether the same up through the first t minus one periods. That can happen like I say, the Z's are the same up to the first t minus one periods, or we might have some model in which the number of treated times during the first t minus one periods are the same in the two, and the outcome only depends upon the number of times the treatment is obtained in the first t minus one turns. But, we're going to just assume that's the case that we're under some model or some setup in which that's reasonable. Now, I'm going to talk about using the regression model. If we assume sequential exogeneity, which is remember the weaker of the two assumptions exogeneity assumptions, I mean to make a sequential randomization assumption to justify within this longitudinal causal inference framework using the regression. So, now in the module on longitudinal causal inference, what did we assume? We assume the period t treatment assignment was independent of future potential outcomes given the previous outcomes can foundries in treatment history. Now, to make this work here, we have to assume a little stronger. We assume the period t treatment assignment is independent of future potential outcomes and potential confounders as well, given the previous outcomes confounders treatment history in the unobserved confounder U. So, this is a little different and of course a bit stronger than the sequential randomization assumption we assumed in the module on longitudinal causal inference. Now, that said, the vast majority of analyses using fixed effects models use OLS regression, which gives consistent estimates of the model parameters under the model, and under strict exogeneity not, under sequential exogeneity. So, here of course, the identification conditions required to justify using OLS would be even stronger, right? Because remember, that's a much stronger condition. Okay, now what we require is conditional independence of future potential outcomes and the complete assignment history of the complete assignment history Z_it. So, in observational studies, an assumption like this is almost never credible, because it requires that potential outcomes do not affect subsequent treatment assignments among other things. So, it's just simply, I'm hard pressed to imagine any situations in which I'd really believe this in an observational study. That's my take on fixed effects models for longitudinal data. But remember, you have these fixed effects models. Whenever you have clustering, you can have random effects models or fixed effects models whenever you have clustering. So, clustering arises not only in longitudinal case, but we've also seen it arises frequently in schools, dyads, families husband-wife pairs, neighborhoods et cetera. So, economists will often use fixed effects models for the cluster in this case. In this case, the fixed effect is postulated to account for a common environment within the cluster that is not measured by observed variables. For example, within a family if you're looking at the siblings and you're trying to model their education, there is maybe some latent family environment about unmeasured variables which might be books in the household. It might be the parental attitudes toward education. These kinds of things whether parents encourage the kids to be creative thinkers that sort of stuff. Anyways, that kind of stuff that isn't really picked up by the observed covariates. Now, if you have something like that, the problems above won't arise because we're not in this longitudinal case. There's another problem. When you assume a stable unit treatment value assumption, here the problem is it's likely that the units within a cluster interfere with one another. So, even if you would conduct a randomized experiment, ignoring interference would lead to invalid inferences as we saw previously in the lessons on interference. Second in our treatment of interference, the estimands we considered were identified under randomization conditions. Now, as the rationale for using fixed effects models is to control for unmeasured confounding and observational studies, different kinds of identification conditions would need to be developed. I'm saying that further development of this topic would be useful. In fact, I'm working on it, maybe stay tuned. Well, this concludes Causal Inference 2. Thanks for making it through these two courses. So, to wrap up Causal Inference 2, I guess what I'd want to say about it is, well we've gone far beyond the material on Causal Inference 1, where we focused on average treatment effects and effective treatment on the treated. The estimands we've considered here in Causal Inference 2 are more complicated. With that, you really tend to get more complicated identification conditions and more complicated estimation procedures. I think it's useful to note that the more complicated identification conditions where you're trying to do something more, but you have to make more assumptions, and if these assumptions aren't reasonable, your results aren't going to be reasonable. So, it's very nice to know what kind of assumptions you need to make. But it might make you want to say, "Okay, I know these assumptions aren't reasonable, so I'm not going to put much faith in my estimates or I'm just not going to do that." So, the statistical literature is very useful because it really lays out nice and clean what you need to assume in order to do these kinds of things. I mean folks might not always like it, but it also tells you what you shouldn't be doing and what you shouldn't believe sometimes. So, I think that's very useful to know. All right, well the other thing I want to say is that I know Causal Inference 2 has been more complicated, and there's so much. I've just tried to skim the surface of it and give you some of the main topics and the main ideas behind these. But I've tried to give you references so that you can go, and if you're really interested in something here, a particular thing, that you can go and read on your own now. Okay, well thanks again.