Hi, in this video, we will discuss propensity scores and also talk about the balancing property of propensity scores. So propensity score is simply the probability of receiving treatment, given covariates. So in particular, we are thinking about the probability of receiving treatment as opposed to take control condition. So, we'll define treatment as A=1. And control, we'll define as A=0 as we've done in the past. So then formally the propensity score, which we'll define by pi i is just the probability that A=1 given Xi. So here, the pi i is referring to, a notation for the propensity score for person i. It's really a function of x. So you can think of the propensity score as a of a function of X, but we are indexing it by i, because person I has a unique set of covariates Xi. So this is probability of treatment, given that person's particular set of covariance. So, that's what we mean by a propensity score. So as an example, just image that age is the only X variable and that older people were more likely to receive treatment. So in that case, a propensity score would be larger for older subjects. I give an example where the probability of treatment among people are 60 years old would have to be greater than the probability of treatment among people who were 30 years old, in this case. Another way you could look at, it is pi i would have to be greater than pi j if person i is older than person j. Another thing we should think about is how we interpret this propensity score. And as an example, if a person had a propensity score of 0.3, that would mean that given their particular covariance. There was a 30% chance that they'll receive the treatment. Next, we're going to think about a balancing score property of the propensity score. So to motivate this, we'll think about two subjects that have the same value of the propensity score, but they might have different covariant values X. So remember, the propensity score's a function of X. It's a function of the co variant. But typically, there's not going to be a unique value of the propensity score that's associated with just one set of Xs. In other words, it could be multiple sets of Xs that can lead to the same propensity score so the same probability treatment. So what we're imagining now is that there's two subjects that have different Xs, but they're pi of X, the propensity score is the same. So in that case, they're equally likely to be treated. So because they have the same propensity score and the propensity score is just the probability of treatment, they're equally likely to be treated just by having different covariant values X. So then a given persons set of Xs is just as likely to be found in the treatment group as the other person's set of Xs, because they both have the same probability of getting treatment so we would expect to find those type of Xs at about the same rate. So we could think of it like, in this example, there is two sets of Xs, two unique value Xs that we're thinking about and we'll expect them to appear in the treatment group at about the same rate. So what this means is that if we were to restrict to subpopulation of people that had the same value of the propensity score, then we should have balance in the two treatment groups. And so, this is what would mean that the propensity score is a balancing score. So a balancing score is something where if you condition on it, you'll have balance. So, the propensity score is an example of a balancing score. So if we were to only consider people who have only the same value for the propensity score, if we restrict our analysis to that group of people. Then if we stratify an actual treatment received, then we should see the same distribution of covariance in those two treatment groups. To make this a little more formal, we could state it as follows. So here, we have the probability of X. So the distribution of the covariance themselves, conditional on the propensity score which we're just writing as a function of X. Remember, the propensity score depends on X. Conditional on that propensity score being equal to some value P. So, this is just some fixed value and conditional on A=1. So condition on A=1 means we're, let's only look at treated subjects. And in fact, we're only going to look at treated subjects that have a particular propensity score that's equal to little p. So, what we want to know what is the distribution of the covariance among treated people who had this particular propensity score? Well, it turns out that, that's the same as the distribution of covariance among controls who have value of the propensity score equal to p. So by conditioning on the propensity score equaling p, what we're saying is let's think about all possible combinations of X that would lead to this one propensity score. So p could be 0.3, for example. It's just a fixed value. So, imagine p is 0.3. We'll say, okay, what are the set of X's that lead to a propensity score of 0.3? Now, let's restrict to people who have those Xs. Now let's also look at treated versus controlled, then we'll see that the distribution of those Xs is the same in the two groups. You can prove this with an application of base theorem and it's not very many stuff. But hopefully, the intuition is clear is to why that would be the case. The main idea is really from the previous slide where a propensity score be in the same for different set of access would mean that you would expect to see, either type of X about as often in the treatment as in the controller. The implication is then that matching on the propensity score should achieve balance. So previously, we had talked about matching on the full set of covariance by taking a distance between them. And that would achieve balance if we do that well, but the same thing would work here where if we simply just match on the propensity score. If we do that well, we should have balance. And this makes sense because we actually, because we assumed ignorability. So remember ignorability, essentially means that treatment assignment is random given X. So, what we'd really doing by conditioning on propensity score is worth conditioning on an allocation of probability. So, all we're really doing by conditioning a propensity score is conditioning it on the rate of which treatment should be assigned. So if we condition on propensity score, that's equal to 0.3. We're really in a randomized trial world where the allocation probability is 0.3, where you're going to randomly assign treatment with probability 0.3 and the control with probability 0.7. But because at that point we're in randomized trial world, we expect to have covariant balance. Next, we'll talk about the propensity score itself and how we actually need to estimate it. So in a randomized trial, the propensity score is generally known. So, people who are planning to randomized trial will typically decide what the allocation probably is. In the simplest case, the allocation probability, the probability of treatment given covariates would actually not depend on covariates. It would just equal 0.5. So, the standard kind of randomized trial where each person has a 50% chance of getting treatment. The allocation probability would just be 0.5 for everyone. You of course, could have some kind of stratified random sampling where you might be conditioning on X and so on. But regardless the allocation probability would be known ahead of time, which means a propensity scores known ahead of time in a randomized trial. So, it's known by design. But in an observational study, it's unknown. So we don't know the possibility of treatment given X, because we haven't designed the study. We haven't been actually determining who gets treatment and who doesn't. However, it's important to note that the propensity score just involves observed data. So the propensity score, the probability of treatment, given covariates only involves observed data. It just involves A and X. Both of which we observed together on all of our subjects. What that means is that we should be able to estimate it. So, we observe which subjects are treated and which are not. We observe the values of X for each of those subjects, so we should be able to estimate the relationship between those. So most of the time when people actually talk about a propensity score, they're really referring to an estimated propensity score. So, how do we actually go about estimating the propensity score? Here, what we'll do is we'll treat the treatment itself A as if it was the outcome. So just for the purpose of a propensity score model, we're going to treat the treatment itself as the outcome. So, you'll see it appears at the left-hand side of the conditioning. And treatment is just binary, so you can use any kind of models that you would use if you want to predict a binary outcome, given a set of variables. So the most popular approach for doing that is probably logistic regression, but I do note that you could use whatever you wanted. So this is really just a classic kind of machine learning problem, as well. So you could use any sort of machine learning kind of method, as well. The importance isn't how you go about it at this point. It's just we need to estimate a probability of treatment given Xs for every subject. So, let's imagine we're going to use logistic regression. So we have outcome A, covariates X. So, we'll just fit that model using standard logistic regression methods. But the difference between the usual kind of statistic regression approach and what we're doing here is that here, our interest is not in the coefficients of the Xs. But in actually getting a predicted probability. So, our second step after we fit the model was actually get predictive probabilities or fitted values for each subject. And usually, that's one line of code in statistical software. We can actually get predictive probability for each person. So for every person, we'll have a predictive probability. So we'll have a number between zero and one, and that will be the propensity score. And in fact, it's an estimated propensity score. But from here on out, we'll just call it propensity score. So, it will be a value between zero and one for every subject.