This video is about a particular type of causal model which are known as marginal structural models. And our objective is to understand what these models are and especially how they differ from ordinary regression models. So previously we talked about inverse probability of treatment weighting estimation for just simple kinds of causal effects, such as an average causal effect. How could you estimate the mean of a potential outcome? But this general approach, this Inverse Probability of Treatment Weighting, can be used to estimate parameters from more general causal models. So more complicated kinds of causal models. So you might not be simply interested in average cause or effect. You might be interested in something more complicated. It could be, it could involve treatment effect modification or other things of that type. So marginal structural model is a model for the mean of potential outcomes. And so the word "marginal" is coming from the fact that it's not going to be conditional on confounders. So ultimately we're interested in a population average causal effect. So an expected value of the potential outcomes for the whole population, not conditional on X's, not given some subpopulation. We really want a causal effect for some whole population. So that's what marginal means of population average. Marginal, typically you can think of it as averaging over, averaging over some whole population. And then the structural part of the phrase here has to do with the fact that we're modeling potential outcomes as opposed to observed outcomes. So to begin with, look at a linear marginal structural model. So the first thing to notice is that our model is for an expected value of a potential outcome. And in particular, it's a model for a potential outcome and some value of little a, and we'll say that it's linear in a itself. So Ψ0 and Ψ1 are parameters, so these are things we would want to estimate. So there are unknown parameters. And let's imagine that A can just take values zero or one, for now, so this could be like control versus treatment. And so in this case, the expected value of y is zero. In other words, the mean of the potential outcome would just be size zero because A would equal zero. So this part here would just go away. Whereas the expected value of y1, the expected value of the potential outcome, if everybody was treated, is equal to Ψ0+Ψ1. So then you could take the difference of that and get an average causal effect. So Ψ1 here, would have a causal interpretation. It would be this average causal effect which is this difference in potential outcomes. So this is one example of a marginal structural models, a very simple linear marginal structural model. You would typically use a model like this if you had, you know, a continuous outcome. So this should look similar to a linear regression model with some key differences, which we'll get into in a moment. But in general, the primary key difference is that we're modeling potential outcomes as opposed to observed outcomes. So here is a logistic marginal structural model which you might use for if you had a binary outcome. So we have, now we have logit of the expected value of this potential outcome. So this is just a log odds of the expected value of this potential outcome. But since it's binary, you could also think of it as, well, you could think of the expected value of Yᵃ as a probability that Yᵃ=1. So we have this logistic model and we'll say that the logit of that propensities. I think we should redo this one. So the key difference between a regular regression model and a marginal structure model is that we're modeling the potential outcomes. So, to further cement that idea, we'll look at a logistic marginal structural model. So this should look similar to a logistic regression model with the key difference being that you'll see the potential outcome there. So we have logit of the mean of Y, but here, we're talking about the Yᵃ the potential outcome. So logit of that potential outcome, of the mean of that potential outcome is just linear in a. So this is, sort of, like a linear logistic kind of model. And I should note as a reminder that if you take the mean of a binary outcome that's just a probability. So it's the probability that P is equal to 1. So the expected value of Yᵃ here, that's just how probability Yᵃ=1. So those are equivalent. So we have log odds that Y=1 is linear in a. And what that says is that if we were to exponentiate this Ψ1 parameter, that's a causal odds ratio, it's a causal odds ratio. And so we can, just to clarify what we mean by a causal odds ratio, we have the odds that Y1=1. That's this part. Remember an odds is just a probability over 1 minus that same probability. So we have the odds that y1=1, and then we divide by the odds that Y0=1. And so, that's what we mean by causal odds ratio. So in the numerator is the odds that Y=1 if everybody in the entire population had been treated and in the denominator it's the odds that Y=1 if everyone in the entire population was given the control. So those were a couple of very simple marginal structural models but you could have much more complicated marginal structural models. So here we'll consider one with effect modifiers. This is also known as heterogeneity of treatment effect. So in other words, the treatment effect might vary across subpopulations and we might be interested in that. So let's imagine that V is some variable that modifies the effect of a. So let's just, for now, think of V as a single variable. So this could be something like, it could be some kind of co-morbidity like diabetes, yes or no. It could be sex or race or anything like that. So this is something that might modify the effect of treatment or in other words, at different values of V.. there might be different treatment effects. While you can include that in a marginal structural model where now we have the expected value of this potential outcome but we're conditioning on V. So we're not conditioning all of the confounders we're only conditioning on, in this case, V - The thing that we care about in terms of effect modification. So you would pick this ahead of time, you might decide ahead of time that for most of the confounders you just want to control for that, average over the marginalize. But there might be some small subset of them that you care about in terms of effect modification. And if you do, that's what you could include in this way. So you're only conditioning on the effect modifiers V. In this case we have to say we have one effect modifier, where we can then include that in a marginal structural model here on the right hand side where we just have a main effect for treatment, a main effect for V, and then the interaction between the two, a times V. So then if we wanted to think about how to interpret these, well one way you could think about it is just, if you wanted to contrast the potential outcomes, the mean of potential outcomes for a given value of V. So here, we're taking a difference in the mean of the potential outcomes at a given value of V. Well that's just equal to Ψ1 plus Ψ4 times V.. Because you knows what difference in these means so the other terms would just cancel out and the only things were left, the only part with the Ψ1 plus Ψ4 times V. And so then you would know if this marginal structure model was correct, then I could tell you a value of V and you would know what the causal effect is. So we would actually estimate these parameters, we would estimate these psi parameters. And so if you knew, if you had an estimate of Ψ1 and Ψ4, then you could also estimate the corresponding treatment effect at any value of V. So you could just plug in the value of V and you would get out a causal effect. So then you could also think about more general marginal structural models. So I'm going to write it just in a general sense here. So we have g() of the mean of the potential outcomes conditional on V. And so g() here is a link function. So in terms of generalized linear models except we're using potential outcomes instead of observed outcomes. So if you're familiar with generalized linear models, this should look similar in that we have a link function g(). We have a mean of an outcome here as a potential outcome conditional on some variables, here, V. And then we'll say that - that ends up equally some combination of a and V and then parameter psi. So this h() function is something we'll specify. So one example of this h() function is what we saw in the previous slide where it was an intercept plus a main effect for a, plus a main effect for V, plus an interaction between the two. So that would be one example of this kind of an h() function. But you could also imagine, let's say, V is a continuous variable you might want to have a quadratic term or something. So this right hand side should be familiar, it's the kind of thing you would specify in a regression model. So it's usually some kind of additive linear thing but you could have a quadratic term in there. Anything like that. But the key is that, here, we're modeling the potential outcomes. So therefore, if we estimate the psis, if we're able to estimate these psis, we can end up having estimates of causal effects. So the key question then is, the key issue to think about is that these potential outcomes are not the same as observed data. So it's not quite as straightforward to estimate the parameters psi as it is in a regression model. On a regression model, the thing on the right here would be observed data. The stuff on the, I'm sorry, the stuff on the left here would be observed data. The stuff on the right here would be observed data. So since it's all observed data, estimation is relatively straightforward. So I have to think about how we could actually estimate these parameters given that the thing on the left involves potential outcomes as opposed to observe data.