In this video, we're going to talk about stratification which is one way that causal effects can be estimated or identified. Essentially, you would stratify on important variables, and then average over the distribution of those variables which is also known as standardization. So we're going to investigate that and see how it relates to causal effects. And then we're also going to talk about limitations with this standardization approach. And why it's something we normally can't do, and why we need additional causal inference methods. We're now going to talk about conditioning and marginalizing, which the combination of the two is also known as standardization. Previously, we saw that if you make certain causal assumptions that you're able to write the expected value of our observed Y given A and X as an expected value of the potential outcome Y superscript A given X. But recall that, we're typically interested in a marginal causal effect. Meaning when that does not involve conditioning on X. So previously, we had defined for example, the average causal effect as the expected value of Y1 minus Y0. And you'll notice in what I just said, I didn't say given X. So we needed to condition on X to be able to link the observed outcome to the potential outcome. For now, we want to get rid of the X and so in order to do that, we'll just average over the distribution of X. To make things simple, we'll just imagine for now that there's a single categorical x variable. So just one x variable, it can just take on a finite number of possible values. It could be binary, 0, 1, that sort of thing. And then what we'll do is we'll be able to get the thing that we want, which is this expected value of the potential outcome. So the expected value of Y superscript a, and that's the thing we want. Because remember, if we have that for every value of a, then we can contrast it and get casual effects, in a particular margin of casual effects. So that's what we want, in order to get that what we'll do is we'll take the expected value of Y given a and X, and average over the distribution of x. And so to see what that is here, first, let's note that this notation here, this capital sigma means summation. And you'll notice there's a subscript x, so that just means average or sum of all levels of x, okay? Then we have this part here, which is the expected value of y given among people who were treated at a little a. And among people who have their covariants equal to little x. So this is the subpopulation and this is something of course, we can observe in a sense that, this is all observed data. We can be strict to the subpopulation of people who have values a and little x. And take the expected value of Y for them, but that's a conditional expectation, right? So that's the expected value of y among people who have X equal to little x, and we can get that for every value of X. So we would have a whole bunch of those expectations of Y, but then we would need to average those. And we'll average those in particular, over the marginal distribution of X. So that's this probability of X equal to little x means. That's the distribution of this covariant in our population of interest. So all this is, the thing that we want, this expected value of Y superscript a. The expected variable of the potential outcome is just an expected value of the observed outcome in these subpopulations averaged over the distribution of the covariant. Sp this is known as standardization, and all we're doing is conditioning. Conditioning means stratify and marginzaling averaging over. So what we end up with standardized mean, and that happens to be the same as the average potential outcome. So as I mentioned, standardization involves stratifying and then averaging. So what we'll do is you could essentially think of it as obtaining a treatment effect within in each stratum. And then pooling across stratum where you're weighting by the probability of each stratum. If you actually had data, you could estimate a treatment effect then by just computing means under for each treatment, within each stratum. And then pooling across the stratum where we're weighting by the size of the stratum. To illustrate that, we're look at a hypothetical example. And this will be kind of a simple example just to illustrate the main ideas. So what we'll focused on is a population is diabetics, people with diabetes. And we'll look at three different treatments. So we'll look at saxagliptin, which is an oral antidiabetic drug. It's a newer treatment for diabetes. First is sitagliptin. So these are two alternatives treatments for diabetes and in fact more focus on new initiators of either. So look at people who are going to begin one treatment or the other. And we would like to know which of these treatments is better. Our outcome is major adverse cardiac event which we will call MACE. So we would like to know is saxagliptin better than sitagliptin when it comes to MACE or vice versa. One challenge here is that saxagliptin is a newer drug. And users of saxagliptin are more likely to have had some past use of other oral antidiabetic drugs. And it's also the case that patients who've had past use other oral antidiabetic drugs. Which we'll call OADs are at higher risk for MACE in general. So these for example, could be patients who have tried a number of different oral antidiabetic drugs. And maybe treatment hasn't been very effective for them. So maybe they're might be sicker patients or they might be patients that are just harder to help with medication. So you could imagine that if these patients are preferentially getting saxagliptin. This is something we're going to need to take into account, if we want to decide how effective saxagliptin is relative to sitagliptin. So what can we do about that? Well, the key idea here is we can then compute the rate of MACE for saxagliptin versus sitagliptin in two populations. So one is patients who have had no prior OAD. So there are new initiators of oral antidiabetic drugs. So there going to initiate saxagliptin or sitagliptin, but they don't have any past use of oral antidiabetics. And then we'll also stratify on patients who have had prior OAD use. So we'll look at those two populations, that's our x variable, is this prior OAD use, yes or no. And then we can compute rates of MACE among saxagliptin and sitagliptin initiators. So if we do that, if we calculate those rates of MACE in these subpopulations. We could then average across these populations based on the size of each population, and then this will end up being a causal effect. If it's true that within levels of prior OAD use, treatment can be thought of as randomized. In other words, the treatment assignment is ignorable given prior OAD use. So this would be the case of, Clinicians base their treatment decision on, the main thing that influences their treatment decision is prior OAD, so they might be, for example, more likely to give saxagliptin treatment to people who had prior OAD. And if there are not other key variables that are determining this treatment decision, then this is the variable we would need. And we could ignore the treatment assignment. We could ignore treatment assignment given that variable. Realistically, in practice we would need to collect more variables than just this one. But we're simplifying to illustrate the main ideas. So let's look at an actual example with some data. So what we're looking at here is raw data of what we observe, what we might observe in practice. So in reality some people receive saxagliptin, some people received sitagliptin. And when we say saxagliptin equal no, that means sitagliptin. And then we also have, some people have the outcome MACE and some people don't. So MACE yes or no. So we have this nice 2 x 2 table. In this particular example we have 11,000 patients. And then we could calculate the probability of the outcome given saxagliptin. So whenever we say given, that just means restrict to that sub-population. So in this case, we say, given saxagliptin equal yes. So given means, restrict to this row. Given saxagliptin equals yes means, just look at that row. And then if we want to know the probability of MACE, we would just take how many had MACE which is 350 and we would divide by the total population size of saxagliptin users which is 4000. We carry out the division, and we get a probability of 0.088. Or about 8.8% of saxagliptin users ended up having the outcome. We also now can consider sitagliptin users. So now we're conditioning on sitagliptin, so we will restrict to this row. And the probability of MACE for sitagliptin users, well, 500 of the sitagliptin users had the outcome MACE, and then 7000 is the population size. And we get a probability of 0.071. So what we see is in our raw data, we see that 7.1% of sitagliptin users had the outcome, versus 8.8% of saxagliptin users. And so just based on this raw data, it looks like saxagliptin users are doing worse. The problem is, we don't know if that's due to saxagliptin being less effective than sitagliptin, or if it's because perhaps saxagliptin was preferentially assigned to people who were sort of worse off. So maybe sicker patients were given saxagliptin. So in general saxagliptin was observed to have higher risk, but we're not quite sure why at this point. So what we would want to do then is stratify on our x variable. So our x variable is prior OAD use. So we have our table on the right is the prior OAD use equal yes people and this table on the left is a prior OAD use equal no group. So that's what we mean by stratifying. We're just creating now two 2 x 2 tables instead of one. We're stratifying on x. X here is prior OAD use. And it turns out that Saxa users are more likely to have prior OAD use and we can basically see that by, if you look at the total number of saxagliptin users who had prior OAD use is 3,000. The total number of saxagliptin users who did not have prior OAD use is 1,000. Whereas for sitagliptin it's 3,000 versus 4,000. So if you look at those ratios, what we see is that the majority of saxagliptin users had prior OAD use, whereas the majority of sitagliptin users did not have prior OAD use. Also, people with prior OAD use are at higher risk for MACE, regardless of treatment. So here what we can see is that, so we're looking at people with prior OAD use, and we're looking at then the risk of MACE. And so we see that in general, if you have prior OAD use equal no, your risk of MACE was 250 out of 5000. Whereas if prior OAD use was equal to yes, it was 600 out of 6,000 and we see in general a higher rate of MACE if you were a prior OAD user. Next we're going to actually carry out the first step that we need to do if we're going to standardize, and that's to compute the probability of MACE within each group. So we can begin on our table on the left here. And we're interested in the probability of MACE if Saxa equals yes. So we're going to be focused here. So on this left hand table, this is the group of people who had no prior OAD use. And we see that among Saxa users, saxagliptin users, there were 50. Out of a 1000 that had the outcome. Similarly, if we restrict to the sitagliptin group, which is here, we get 200 out of 4,000. Which is 5%. So what we see is in this left hand table, we see that the risk of mace is 5% regardless of if you use saxagliptin or sitagliptin. So once we stratify we're actually on the left hand table seeing no difference in the rate of the outcome. Well, we have a similar story on the right hand table. Now if we look at saxagliptin users, we have 300 out of 3000 that have MACE that have the outcome. And, for sitagliptin users, we restrict to this row and it's 300 out of 3000, and that's 10%. And so what we see is that among prior OAD users, which is the right-hand table, we also see that the risk of MACE is 10% regardless of treatment. So if we think about these two tables collectively, what we see is that in either group, in either sub-population, based on x, based on prior OAD use, the risk of MACE is the same regardless of whether you get saxagliptin or sitagliptin. So now in contrast to the prior table that didn't stratify, it looks like there's no difference in terms of treatment effectiveness, whereas if you, sort of naively, didn't stratify on prior OAD use, it looked like saxagliptin was a less effective medication. Next we're going look at the mean of the potential outcomes among saxagliptin users. So on the previous slide we were just looking at the rates of the outcome in these different sub-populations, but those were always conditional on x and remember we want to marginalize. We want to Have a expected value of a potential outcome that's not conditional on x, so we're going to have to marginalize. So next, we'll go through how to do that. So our goal here is this potential outcome, which is the expected value of Y if everyone in our population had, hypothetically been assign saxagliptin. That's what we want to know. The way we're going to do that is by calculating the expected value of Y among saxagliptin users at each level of X. And then take a weighted average of those based on the size of those corresponding populations, so let's first, we'll walk through each component here. So first we're going to focus on saxagliptin users in the prior OAD equal yes group. So, what we do is first, we calculate the expected value here. Which is doing this component. And that's just 300 out of 3000, that's what we've seen before, that's just the risk of the outcome. If you're in the saxagliptin group and you have Prior OAD use equal yes, so that we've seen before and I should mention that this whole equation here is exactly the one we saw a few slides ago where we're averaging over the marginal distribution of X. It's an expected value times the probability of that value of X plus the expected value times the probability of that x, so it's just a weighted average and that's what we are trying to calculate so we're filling in these pieces, so the next piece, so next we want this one prior OAD is equal, what's the probability of prior OAD use equal yes. Okay, in other words, what proportion of our whole population. Our whole population is 11,000 people. What proportion of them have prior OADs and that'll tell us the probability of prior OADs. While the proportion of them that do are the number of them that have it is here 6,000, so we have 6,000 out of a total of 11,000. That's a proportion that have prior OADs. So that's the number we're going to fill in there. Next, we'll step through the same thing for the prior OADs equal male group, so now we're focused on this sub population. Our first calculation is, just what we saw before, 50 out of 1,000, that's the expected value of Y given saxagliptin and given prior OADs. So we've walked through this before, that's 50 out of 1,000. And next we're going to need to know what proportion of the population. Is the prior OAD use equal no group and that's just 5,000 which is a total number who had no prior OAD use. Divided by our total sample size, which is 11,000. Which is just the sum of these two. So you'll notice, now we've filled in all four of those components and we just read them right off of these tables. And what that will do then is we can calculate that and actually get an expected value of the outcome. Which the outcome here is, essentially you could think of it as the probability of MACE, if everybody had been assigned saxagliptin. And that happens of you do the calculation about 7.7%, now you can carry out the same calculation for the sitagliptin group. So now we're going to talk about the expected value of the outcome which is MACE here if everybody had possibly contrary effect and assign sitagliptin. We can basically walk through the exact same steps, but now restricting to the second rows of these columns where we're dealing with sitagliptin. So, we could just, we had already walked through these calculations for the other groups so It's probably not necessary to do all the details here but I'll just mention one of them. So for example, probability prior OAD use equally yes. This is just the same as what we saw in the previous example, right? This hasn't changed. This is our population probability of being in the prior OAD group use, group and that's 6,000 divided by 6,000 plus 5,000. So that's that. And we had previously on other slides calculated this one. So we've actually already done these calculations before we just have to put them together and we end up with 7.7%. So what we can see here is that the, once we marginalize we end up with the mean of the expected value for saxagliptin and sitagliptin is exactly the same, it's 7.7% in each group. So, in other words, the potential outcome is exactly the same if you gave everybody saxagliptin versus everybody sitagliptin. In principle, that's a very effective way to get a causal effect. We find these important X variables that we need to make the ignorability assumption whole. We stratify, we average, and then we can get a causal effect. However, this becomes problematic very quickly, because you can imagine having many X variables. You might need many X variables to achieve ignorability. So in practice, a clinician might not just look at your history of medication use, but they might look at your history of many variables, and even your own preferences, and your general health, your age. So there might be a big collection of variables that we need to control for. So and if we tried to stratify we would basically have many empty cells. What we mean by many empty cells is there will be combinations of X variables for which we just have no data. There is no people that have that combination. So there is no way for us to calculate a mean and then average. So for example, if you stratify on age and blood pressure there will probably be many combinations of age and blood pressure for which you just have no data or which there's just no people. So, it becomes problematic very quickly. So we're going to need alternatives to standardization. The concept of standardization is extremely important because we're going to keep that concept throughout the course. We're always going to be doing something that's trying to get at standardization, but we're going to have to do something a little different. We're going to have to do things that are slightly different. So in much of the remainder of the course, we're going to explore different options. So with standardization as sort of the ideal situation, if hypothetically you could have that much data, we're going to think of alternatives. And in particular were going to focus on matching inverse probability of treatment waiting and propensity score methods when it comes to observational studies. And were also look at instrumental variable methods, which is getting at what to do if you might have natural experiments where there might be some variable that could be thought of as a randomizer.