0:10

Hi, this video is on propensity score matching.

Some of the concepts that we will cover include calipers,

propensity score overlap and trimming the tails of a propensity score distribution.

So previously, we noted that a propensity score is a balancing score, so

matching on a propensity score should achieve balance.

So it should achieve balance on the covariate distribution between treated and

controlled subjects even though we did not directly match on the covariates.

So even though we match on the propensity score,

we should still end up with balance.

And it also should be noted that the propensity score is a scalar, so

each person will just have a single value of the propensity score.

So it will just be one number between zero and one for each person.

So this greatly simplifies the matching problem because we just have

to match on one variable as opposed to a whole set of variables.

So essentially, the propensity score's summarizing all the xs,

and then we can just match on that summary.

So once we've estimated the propensity score, but

before we actually carry out matching one of the things that people typically do

is look for overlap in the propensity score distribution.

What we mean by this is comparing the distribution of the propensity scores for

treated and control subjects.

And we basically want to look for overlap between these distributions.

So we're interested in whether all of the subjects had at least

some positive probability of receiving either treatment,

and this is typically done with a plot.

So here's an example where the propensity score distribution for

the control and treated groups.

1:59

And you'll notice that the propensity score is between zero and one,

as it should be, so it's a probability of treatment.

And you'll notice that the treated group here which is in blue, that their

propensity scores shifted over a little to the right relative to the control.

So for the treated group,

the peak of the propensity score distribution is about there.

So they have a higher probability of receiving treatment compared to the,

on average compared to the control group, which has a peak that's a little lower.

So this is what you would expect, right?

Because the treated group you would think on average would have a higher probability

of receiving treatment.

But in this case, what we see is that there's overlap everywhere,

so this is actually the kind of plot you would like to see if you're going to do

propensity score matching.

2:51

So what we mean by overlap is that no matter where you are,

where you look on the propensity score.

Let's say you look out here at one of the extremes,

even though there's more blue than red,

there's still some people in the control group that

have a probability of getting treatment that high.

So we're seeing overlap wherever you go.

Same thing on the other tail is there's some people in both the treated and

control group have a very, very small probability of getting treatment.

So there's overlap everywhere, and

what this means is that our positivity assumption is probably reasonable.

Remember positivity refers to the situation where all of the subjects

in the study have at least some chance of receiving either treatment.

And that appears to be the case here.

So this would a situation where you would feel good about doing propensity score

matching.

4:02

So what we see, again, the control group is in blue.

I mean I'm sorry, the treatment group is in blue and the control group is in red.

And we would know that in part because the probability of treatment is much

higher for the treatment group.

But there's a major lack of overlap here in that at the high end

of the propensity score, there's hardly anybody in the control

group that had a propensity score like that.

So out here for example, it looks like there's possibly nobody

in the control group who had a propensity score quite that high.

So what we mean by the positivity assumption being violated then is,

what it means is if you have a set of covariates such that your

propensity score is basically in this range, close to 1.0.

Then you essentially had no chance of getting the other treatment,

getting the control treatment in this case.

So really where all of the sort of interesting things that we could learn

about are taking place is in this box here is in this range of the propensity score.

So I marked off this area where there is overlap.

So we really can't expect to learn about a treatment effect in the extremes.

So we can't expect to learn about a treatment effect out here because these

are people with covariates such that they were guaranteed to

get the control condition, not the treated condition.

So we can't learn anything about a treatment effect among people

who had no chance of getting treated.

And the same thing out here.

We can't expect to learn about a treatment effect among people who have covariates

such that they were guaranteed to get treatment.

But in this box, these are a subpopulation of people who have

covariates such that they really could have gotten either treatment,

and so treatment is effectively random within that range.

So one thing that in this case you might want to do is actually get rid of

individuals who have extreme propensity scores and focus on that box.

So we'll talk a bit more about that in a minute.

6:28

So this is what's known as trimming tails.

So if you have lack of overlap, trimming the tails is an option.

And this really just means removing subjects

from your data set that have extreme values of the propensity score.

And what extreme means is a little bit up to the analyst,

but one example would be the following.

So one thing you could do is you could first remove any control subject whose

propensity score is less than the minimum propensity score in the treatment group.

So remember that we expect treated subjects to have higher

propensity scores, in general, than the control group.

So here, if we think about a control subject whose

propensity score is less than the minimum of the treatment group.

Those are people where they would be out in this area,

and yet we would want to remove them.

That's one option.

Another example is on the other tail, treated

subjects whose propensity score is greater than the maximum of the control group.

We could potentially remove from the study, or chop off.

So let's say the maximum propensity score in the control group is right about here.

Then we could remove these people from consideration.

So that would be one way to do it.

You could think of other ways of trimming the trails, but that's the main idea.

So you could do this step first,

which makes the positivity assumption more plausible.

And then you could carry out matching after you trim the tails.

And one of the main points here is that it should prevent extrapolation.

And again, what we basically mean by that is if you were to try an estimated

causal effect say for people who have a propensity score that's very small.

You would have to be extrapolating, because we don't actually have any data on

treated subjects with a propensity score like that.

So there's not information in the data about that relationship,

the mean difference in the outcomes for people who have a small propensity score.

So if we're going to use that data to estimate a casual effect,

we would have to extrapolate.

So we'd really prefer not to do that, so you can trim the tails.

8:56

So next, so the trimming the tails is an optional step, but it's certainly good to

look at the propensity score distributions and see what the overlap is like,

and then make a determination on if you want to trim the tails or not.

After that, then we could proceed to match, and what we'll do now is we'll just

match on some distance measure based on the propensity score.

So now, we have a single number for

each person that we're going to match on this propensity score.

And so we could just calculate a distance between any two subjects on

the propensity score, and then try to minimize distance.

So again we could use greedy or nearest neighbor matching, or

we could use optimal matching.

We're basically taking the same steps as before,

except our distance measure is now a distance based on propensity scores

as opposed to a distance based on a collection of covariates.

9:50

And next, I'm mentioning something that might be helpful and something that

people often do, which is rather than use an untransformed propensity score.

A lot of times, people will first transform it using a logit transformation.

So a logit is just a log-odds.

So before you actually match, you could take the log-odds of the propensity score.

And the reason you would do that is basically to kind of stretch it out

in a sense.

So the propensity score is sort of, it's between 0 and 1.

And a lot of times, let's say you had a rare treatment.

Well, the propensity scores would tend to be very small for everybody.

And it would be very bunch up in a small range.

But if you transformed it by taking a logit transformation,

it will essentially stretch it out.

It's a one-to-one transformation, but it will basically spread it out and

make it easier to find matches.

So the logit of the propensity score is unbounded so

it could take a value anywhere on the real line.

But it still preserves the ranks of the propensity score itself.

So you could match on logit to the propensity score

rather then the propensity score.

So this is, you could do either, but this is something that is often done and

I think there's a lot of situations in which it ends up being helpful.

Another thing that we could do is use a caliper, and

we would do this to make sure that we don't have any bad matches.

So a caliper would basically be our definition of what a bad match is.

So the caliper is just the maximum distance that we are willing to tolerate.

So it's sort of the threshold between an acceptable match and

an unacceptable match.

So in practice,

a lot of times people use a caliper based on standard deviation units.

And in particular, the most common thing I see used in practice is

the following, where your caliper is 0.2 times the standard

deviation of logit of the propensity score.

So that sounds kind of complicated but we can just look at it in steps.

So imagine first, you just estimate the propensity score.

Right, so we use logistic regression for example,

we get a propensity score for each person.

And that's just a value between 0 and 1 for every person.

Then we take a logit transformation of that propensity score.

So we just take the log-odds.

So that's a simple one step calculation, and

now we have logit of the propensity score for every person.

So that's just a variable in our data set, logit of propensity score.

Now, we could just take the standard deviation of that.

So because its a variable in our data set we could just ask our statistical

software for the standard deviation of that variable.

12:34

So that will be a number, so we have a standard deviation and

then we could set our caliper to 0.2 times that value.

So when I say 0.2 times the value from 3, I mean the value from this step.

0.2 times the standard deviation itself.

So what this is really getting at is,

we want to have some idea on how much the propensity score varies.

So that's what the standard deviation will tell us.

And then what we'll consider an acceptable match will be based largely on that.

So if there was a large variability in the propensity score, we might be willing to

tolerate a bigger absolute difference in the propensity score in a match than if

the propensity score was sort of very tight, was very narrow range.

Then we might have to be a little pickier, so that's the main idea.

0.2 times the standard deviation is a somewhat arbitrary value,

but it seems to work well in practice.

But this value 0.2 is something that you can change.

And in fact, you definitely would want to consider changing it in a sense that you

could start for example with the caliper at 0.2 times the standard deviation.

And then carry out the matching, assess balance, and if you're unhappy

with the balance you could then make the caliper a little smaller.

So you could change it to 0.1, for example.

You could set the caliper to 0.1 times the standard deviation.

Then you'll end up with fewer matched pairs, but they should be better matches.

So again, there's going to be a biased variance tradeoff where if you make

the caliper small you're going to have less bias,

you're going to have better balance.

But you're also going to end up having fewer matched pairs, which means they'll

be, ultimately your treatment effect estimate will have more variability.

So we have to think about these biased variance tradeoffs.

Then once we've carried out a propensity score matching,

everything proceeds exactly as if we had matched directly on covariates.

So if we want to carry out an outcome of analysis, we could carry out randomization

tests, or we could use conditional logistic regression,

Stratified Cox models and so on.

So once we've matched on the propensity score,

we treat it like any other matched analysis.