0:09

This video is about inverse probability of treatment weighting, and in particular,

using that methodology to estimate parameters from a marginal structural model,

and we'll focus on how to do that on a conceptual level.

So before we get to estimation of parameters from a marginal structural model,

we'll first review how you would estimate parameters from a standard regression model.

So let's begin with a linear regression model.

This is a pretty standard-looking linear regression model,

and you'll notice here that what we... on

the left hand side is an observed outcome as opposed to a potential outcome,

so that's observed data,

and we'll say that the observed outcome is related to, say,

covariates X in a linear fashion, plus an error term,

which is typically mean zero constant variance kind of assumption about that.

So the error term is just random noise.

X betas means it's linear and X—the mean is linear and

X—and so then if you if you want to estimate beta,

then typically you could use a least squares approach,

and what that ends up... so basically you would minimize least squares.

So you would have Y and you would subtract X beta;

you would square that, add it up, and minimize it.

In other words you want to sort of minimize the distance between Y and X beta.

What value of beta,

what beta hat would minimize that distance.

Estimation, that kind of least squares

estimation—you end up having to solve something like this.

So this would be after a couple of steps of,

sort of a calculus slash algebra step,

where you're trying to minimize least squares,

you end up having to solve something like this.

This is an estimating equation because

we have something on the left involving data in the parameter,

and on the right it's equal to zero,

and you can solve that to get the parameter.

This is the standard kind of equation for estimating parameters from a linear regression.

And so this value of beta,

which we'll call beta hat,

minimizes some square deviations.

So it's really trying to minimize the distance between Y.

Now if we could also think about generalized linear models.

In a generalized linear model it would

look like this where you have the expected value of Y given X.

We'll call that mu, so that's the mean.

And then that would be equal to some linear part, X beta,

but wrapped up in this function,

this g inverse, where g is a link function.

So g inverse is the inverse of the link function.

So for example, if you had a logistic regression model,

g inverse would be the inverse of the logit function.

Examples of generalized linear models include linear regression,

logistic regression, Poisson regression.

You can end up estimating beta by solving an estimating equation,

and so that is what we're showing here.

So this is an estimating equation,

so you have the derivative of the mean with respect to beta,

some variance term, and then Y minus the mean.

And you can end up solving this for beta,

and that would give you the usual kind of estimate

of data in this case from a generalized linear model.

So now we want to think about how to do that kind of an approach,

except for marginal structural models.

So remember from marginal structural models,

we're modeling the mean potential outcomes as opposed to the mean of observed data.

But you also notice that

a marginal structural model looks a lot like a generalized linear model.

So for example, we would have the mean for potential outcomes.

As one example of a marginal structural model is this one where we have

the inverse of the link function and then some linear term here,

which is just involving treatment,

so it's just a linear function of treatment.

It looks like a generalized linear model except for it's potential outcome.

This is not equivalent to a regression model;

I'm emphasizing that here.

You'll notice there's two things actually here to pay attention to.

One is that in a regression model we would be

conditioning on the observed random variable treatment,

capital_A, and you'll see capital_A over here.

We're not setting A to whatever we want.

We're conditioning on it.

So in other words, it's restricting to the subpopulation of people who have treatment,

capital_A, whereas up here we're setting A,

so there's little a, there's little a.

We can set that to be whatever we want.

This is a model for potential outcome,

so we're imagining some world where

we were able to set everybody's treatment to little a.

So we're not conditioning on capital_A here;

we're setting treatment to little a.

In previous videos we talked about this,

the important distinction between conditioning and setting.

So for the marginal structural model,

we're setting for the regression model you conditioned, you know.

And these two things are generally not going to be the same because of confounding,

so if you had a randomized trial,

then fitting this regression model should be fine because there's no confounding,

and parameters from this regression model should represent a causal effect.

But otherwise, they shouldn't because of confounding.

However, the situation isn't hopeless because we have learned that you can create

a pseudo-population using inverse probability of treatment weighted,

that's free from confounding.

So as long as we assume ignorability and positivity,

as long as those assumptions are met, we can create

a pseudo-population where there's no confounding.

And so that's the idea for estimating parameters from

a marginal structural model is if we use these weights,

then we can apply those weights to the observed data.

We can create this pseudo-population;

now the pseudo-population doesn't have confounding,

and we can just use, then,

standard estimating equations that you would use for a generalized linear model.

So in fact we can estimate these causal parameters,

psi, using this estimating equation here,

and you'll notice the key difference between this and

the generalized linear model one from a couple of slides ago

is that we've stuck in this W—these are the weights.

So this is one over your probability of your observed treatment,

and all this is going to do is,

then, rather than analyze the original outcome data,

we're going to analyze the pseudo-population outcome data.

That's the one that's unconfounded,

so now we can actually use standard sort of

regression technology to estimate causal effect.

And I note here that the weights,

I'm writing it in a slightly fancier way where because A is binary,

that's just an indicator function,

and we have the same kind of thing over here, so if A=1,

it's going to use this part of it because if A is equal to one,

this part over here would be zero and it would go away.

Whereas if A=0 then we'll use

this part because this part would go away which is exactly what we want.

Right if A=1, we want to use the first part which is propensity score,

if A=0 we want is the second part which is a probability that you are in the control.

So it's just a fancy way to write what we've talked about before.

So hopefully the intuition is clear here is that we can apply.

We can't just use a standard regression model or fit

a center or regression model and get causal effects out of it because of confounding.

But if we use waiting to create a pseudo population that's

unconfounded then we can use standard regression technology test to make causal effect.

So we can actually,

then write out the steps for us in making parameters from marginal structural model.

So first thing you might want to do is estimate the propensity score itself.

So remember that's a prob... your

modelling the probability of treatment given confounders.

So as an example you might want to fit

a logistic regression and then you... from that you'll get

the... from this propensity score model you can then get predicted probabilities.

So for treated subjects you would want the probability of treatment for

control subjects you would want the probability of not getting treatment.

So we can directly output that from propensity score model that you've fitted.

Then we can create weights.

So for treated subjects we would just,

would set their weight equal to one over the propensity score and for

control subjects we would their weight equal to 1 over 1 minus a propensity score,

which is the same as one over the probability of not getting treatment.

Okay so now we have our weights,

then we would then... we would need to specify

the marginal structural model that we're interested in.

So are you interested in just a sort of standard kind

of marginal structural model with no effect modification?

Are you interested in effect modification?

You have to figure out what kind of model you're interested in.

And also, you would want to think about what your outcome is like.

Is it continuous? Is it a binary? Is it a count?

So that would also help you decide what kind of model you want.

So for example, if it's continuous you would typically

want a linear kind of marginal structural model.

If you had Count data like number of hospitalizations,

you might want to put some kind of log linear model.

But once you've specified a marginal structural model,

then you can use software and specify

a regular generalized linear model but then tell it to do a weighted version of it.

So most software... statistic software that can fit

generalized linear models will

have an option where you can set weights equal to something.

So you can do a weighted version.

So one final important point is that you

want... We've done all this weighting and the weighting can

kind of make the sample size look bigger than it really is.

So we've seen examples where you might start with

one person in the in one group and

nine in the other and after waiting there's 10 in each.

Right so this is kind of it appears to be artificially inflating the population size.

So you wouldn't want to just report

the standard errors out from this without doing some kind of fix up for that.

So we have to account for the fact that we've weighted when we estimate variances.

Well, pretty straightforward way to do that is to use

asymptotic or what are known as sandwich variance estimators.

So these are really common in statistics software.

So for example if you use generalized estimating equations it

would typically automatically output these robust asymptotic variances.

But you could also use bootstrapping,

if you use bootstrapping that would

also properly account for the fact that you're weighting.

But so it's not difficult to do but it's something to keep in mind.