This video is about the Distribution of Weights. So we're using inverse probability of treatment weighting estimation, but it turns out that large weights can create a problem. So in fact, large weights can lead to large standard errors. So we're going to look at that, why that is the case. And also, we'll look at how to check the distribution of weights. To begin, we'll think about why do weights matter at all? What I mean by that is the actual size of the weights, the actual values of them. Why does that matter? Well, larger weights lead to noisier estimates of causal effects. And so why is that the case? Well, let's consider an extreme case where one person has a weight of 10,000. And of course sometimes you will find this in your data. If you carry out these kinds of analyses, you might find somebody has a very large weight, it could be as large as 10,000. So imagine one person has a weight of 10,000, well, that means they basically represent 10,000 people. So imagine that our outcome's binary just for simplicity, so it's just a yes or no kind of variable. So this one person, if they happen to have the income, if their outcome is yes, that's basically counting as if 10,000 people had this outcome equal to yes. Whereas, if they had the outcome equal to no, if they didn't have the outcome, that would be like 10,000 people having to know. So much is riding on this one person that, that creates noise, in a sense. If one's person data can greatly affect the estimate, then you have highly variable kind of estimate. So in general, we would prefer for people to not have very large weights relative to everybody else. So let's think about just some intuition here, about why you get large standard errors, if you have large weights. And I think one really great way to think about it is with this idea of bootstrapping. So one way to estimate standard errors is with bootstrapping. And so in case you're not familiar with it, the idea is as follows. So you randomly sample, with replacement, from the original population. So originally, you have ten people in your population. So you're data has, let's say end rows you have end observations. Well, you can now randomly sample, from that original sample, with replacement. So now, some people will appear in your new data set, possibly multiple times. Some people won't appear at all. Well, you're creating a new data set of the same size, bootstrapped from the original. And so you'll have, some people will appear in both, some people will appear multiple times in the second one. Some people won't appear at all in the second one. But this should represent a random sample from your original population. You can then estimate the parameters in this bootstrap data set, because now I have a parameter, I'll submit from a bootstrap population. I can repeat this process, I can do it again. From my original population, I can bootstrap again. I can estimate the parameter. I can record it, and I can keep doing that. And so each bootstrap sample is like a random sample from the original population. And then if I keep recording parameter estimates, I'll be capturing the variability of the parameter estimate. So I'll record the parameter estimate a whole bunch of times, then if I take the variance of that, that's like the variance of the estimator. If I take the standard deviation of that, you could think of that as an estimate of the standard error. So now, imagine we're doing bootstrapping but there's one person, let's say, in our sample, that has a very large weight, so it could be 10,000, or something. Well, in some bootstrap samples, they're going to appear in there, they might appear more than once. But in some of our bootstrap data sets they'll be in there, and in some of them they won't, just by chance, they won't be in there. And so you can imagine that the value of our estimate in each of these bootstrap samples is going to depend a great deal on this one person, whether they are in or out. So if somebody has a lot of weight, it's going to greatly increase the variability of your estimator. So I just want you to picture this one person who is super extremely valuable. They count for a lot of weight. They represent say, 10,000 people. And sometimes they're in your data set, and sometimes you're not. So you could imagine that when they're in it, it's very influential. They're going to pull the parameter estimate towards them. And when they're out, that's not going to happen. So then from sample to sample, the parameter estimate's going to vary a lot in large part due to this one person. So if I did capture the intuition behind why having large weights for some individuals can lead to a really noisy estimator of let's say causal effect. And I also want to just point out the relationship with the positivity assumption. So remember, the positivity assumption says that everybody should have some non-zero probability of getting new treatment. What a large weight means remembers is that, is one over the probability of the observed treatment. So if that weight is very large, it means the denominator is very small. So the probability of that particular treatment was very close to zero. It's not exactly zero, so it's not necessarily a violation of the positivity assumption. But it might be close to zero, which means you're close to having a violation of that assumption. So it means that there's certain values of your covariance for which it's very unlikely that people will get a particular treatment. So the probability might not be zero, but it might be very close to zero. So we're sort of flirting with violations of positivity assumption when you have extremely large weights. So it's the kind of thing that, it's another reason why you would want to be concerned about people with very large weights, because it would make you start to question your positivity assumption. As far as to actually checking the weights, one of the easiest things to do is just to actually plot the weights in the graphs, so I showed two examples. These are both trying to get at the same kinds of information. So these are both plots of the weights. On the left-hand side here, it's just a density plot. So I have my weights, I have one weight per person. And I could just ask for a density plot, and that's what this. And I also have these tick marks on the bottom, which represent the actual value. So, for example, that means that there was somebody who had a value exactly that which was is a weight of maybe roughly 17 or so, so that represents one person's actual weight. And you'll see that most of the weights are recurring over here. You can see that this curve, it goes very high over there. And you can also see that's very thick over here. So there's a lot of people with weights that are between 0 and 5, and then especially between 0 and 10. Most of the weights are between 0 and 10. But then we do see some larger ones with an upper limit of over 20. But it's not so extreme that I'm probably very alarmed, but we have some people who have quite a bit more weight than others. Another way you could display this is with this plot on the right here, where all I did was I sorted the weights from smallest to largest, and then I just, I ask for a plot. And what it then naturally does is it just plots it in order. So index here just means, it's really like an order, it's just like it's saying, one is the smallest, two is the second smallest, and so on, and then out here, these are the largest. And I kind of like this plot because it's pretty easy to see sort of how many and when the values get large, and how many there are and stuff. So you can see that the large majority are under 5. So here's a line of 5, so almost everybody's, or the large majority are below 5. And then if you get out to say 15, you know there's only a few people above 15, and in fact four people above 15. So it's kind of plot makes it pretty easy to see to look at your weights, but you could use either one of these. You could also just ask for some summary statistics of the weights. So this is some of R code where I first use a summary command, which gives you some quartiles, min and max, median. And so you'll see the third quartile, the 75th percentile is 2.28. So 75% of people have a weight below that, and the maximum value is a 21. I also use this table in head command, which is just saying, show me the smallest, in this case, six values, and also show me the six largest values. So here, these are the six largest values. And it's also telling you the observation number or the ID of the person in some. This 795th observation of my data set had the largest weight. So if I was interested in looking closer at that individual, I would certainly know where to find it, and I could look at their data a little more carefully. So these are just a couple of ways that you could go about investigating whether you have weights that are so large that you're worried.