This video is on causal assumptions. The primary learning objectives here are to understand some of the causal assumptions that we need to make to link potential outcomes to observed data. In particular, we aim to understand the following four assumptions, what's known as SUTVA, consistency, ignorability, and positivity. So identifiability, identifiability of causal effects, it's going to require making some untestable assumptions. And statistical identifiability in general has to do with identifying some parameter from actual data. So a parameter is considered identifiable is if you can, if you're basically able to estimate it from data. And in the causal inference area, there's this fundamental problem of causal inference where we don't see both potential outcomes, and therefore we're going to have to make some assumptions if we want to identify causal effects. And, in particular, in the causal inference world, some of the assumptions we have to make are untestable, and these untestable kinds of assumptions are called causal assumptions. And the most common assumptions are the following, so there's the Stable Unit Treatment Value Assumption, which is also known as SUTVA. There's consistency, ignorability, positivity, and assumptions, assumptions will have to do with the observed data. And we're going to assume, as we talk about these assumptions, that our observed data consists of an outcome, Y, a treatment variable A, and then some set of pre-treatment covariates X. So X you could think of as the kinds of information you might want to collect for your particular study. So if you are in a medical setting, it could be demographics, age, race, and so on, it could be clinical variables, diagnosis, laboratory values, and so on. But just think of these as a collection of variables that you might want to control for, for example. So we have our data consists of Y, A and X, so the first assumption, known as SUTVA, you could actually think of it as being two assumptions, the first one being no interference. What we mean by that is that units do not interfere with each other, and units here would refer to typically people, whatever your population is targeting. So typically in biomedical research, we're talking about patients, but what do we mean by interfere? So that would have to do with whether treatment assignment of one person affects the outcome or treatment assignment of another unit, or the treatment effectiveness of another person. And another word for this is either spillover or contagion, so you could imagine a couple of scenarios where there would be interference. So if you were, maybe you were doing some behavioral intervention, so your treatment is some kind of behavioral intervention, but the people in your study interact with each other. So, how effective the intervention is on one person might depend on what intervention the other people that they interact with got, so that would be an example where there is interference. So, how effective a treatment is on a person, on one person, might depend on what treatment other people got. You could also think of this in vaccine studies where how effective a vaccine is for one person might depend on what proportion of the population receive the vaccine. So typically we're going to assume no interference, that this isn't happening, that when we assign somebody a treatment how effective that is isn't dependent on what is happening with other people. There are causal inference methods that can handle interference, but we're not covering those in this course. Another part of the SUTVA assumption is that there is one version of treatment, and this is important because it's going to have to do with having your potential outcomes being linked effectively to your observed data. If there's multiple versions of treatment, it becomes difficult to understand even what a causal effect means and it causes a number of other problems. So, we think of one version of treatment where there's one variable that we can hypothetically intervene on and it's very well defined what we mean by treatment. So, if you make the SUTVA assumption, the advantage of it is that you can write potential outcomes for the ith person in terms of only their own treatments. So, when we define potential outcomes, we talked about the potential outcome as an outcome if the person hypothetically received treatment, equal little a, for example. We didn't define it in terms of the outcome that would be observed given what everybody in the whole population received as treatment. So it’s a SUTVA assumption that allows us to do that, we don't need to write potential outcomes in terms of the treatments of everybody else, we only need to write the potential outcomes in terms of this particular person's treatment. So this really simplifies the problem quite a bit and that's the reason that that's usually made, and in many situations, this will be a reasonable assumption. The consistency assumption is the next one we'll talk about, and this is really, in principles it's a pretty obvious or simple assumption where here we're really directly linking potential outcomes and observed data. So we're saying that the potential outcome under treatment A equal little a, which we define as Y superscript little a, that's just equal to the observed outcome if the actual treatment received was A equal little a. So your, when treatment is actually equal to little a then our observed outcome is directly equal or corresponding to the potential outcome, Y superscript little a. So, if you remember, for potential outcomes what we imagine Y superscript little a is the outcome that would be observed if treatment actually took value little a. So then if treatment does actually take value little a, then we're saying that the observed outcome is equal to that potential outcome, so this is just directly linking potential outcomes and observed outcomes. So, in other words, our observed outcome, Y, is equal to potential outcome, Y superscript little a, if treatment is equal to little a, and that's true for all a, for any possible treatment. Next we'll get into the ignorability assumption, which is probably the most important assumption that we'll discuss, and the one that people usually give the most attention to. And this is also sometimes what we refer to as the no unmeasured confounders assumption. So now we're going to have to involve these other kinds of variables, these pre-treatment covariates X. And the basic idea is that treatment assignment is assumed to be independent from potential outcomes, conditional on these pre-treatment variables. So these pre-treatment variables are a set of variables and if we have the right ones and enough of them, then we are assuming that effectively treatment is randomly assigned. So, this notation here, the symbol in the middle there, it means independence, so it's saying that potential outcomes Y zero comma Y one are independent of treatment variable A conditional on X, so conditional on these sort of baseline pre-treatment variables. So you could think of these, these variables acts as these, what people typically think of as confounding variables, treatment might be in practice assigned to people, for example, who are older, or sicker. So it's not randomly assigned, but once you control for things like age and health, then we might be able to think of treatment as being randomly assigned. So X is that collection of variables that are going to sort of create this kind of independence. So among people with the same values of X, then we could essentially think of treatment as being randomly assigned. And what we mean by random here is strictly that it's independent of the potential outcomes, so it might not be random in some other sense. So, if, for example, your outcome here is blood pressure, so, let's say systolic blood pressure, if treatment was assigned completely independently from the expected response to treatment, if the clinician was making a determination on who should get treated, that's independent from who would benefit from treatment. Then it would be independent, but, of course, that's unrealistic but now imagine that they're basing the treatment decision on some variables that we've observed. So, they might be more likely to give treatment to people who are older or to people who have history of higher blood pressure, or those sorts of things, things that we can capture on our dataset, those are a collection of X's. And if we have enough of those, the idea is that now treatment is effectively randomized, so that's the important assumption known as ignorability. And it's called ignorability, because what we mean is that treatment assignment itself becomes ignorable, it becomes a non-factor, as long as we have enough of these X's, if we have the right covariates, now treatment assignment, we don't have to worry about anymore, it's effectively randomized. So this is the, again, the notation and we'll consider a simple example, suppose X is just a single variable and we'll just say that it's age and let's just say it's either a young girl, that we're going to really simplify things. So let's just say, older or younger is really the X variable that matters, and let's just say, older people are more likely to get treatment A equal one, but older people might be more likely to have the outcome, let's say hip fracture, regardless of treatment. So in this case, age is related to treatment, who gets treatment, and also the age is related to the risk of the outcome, even regardless of treatment. So, in that case, treatment is not randomly assigned, right, because people who are older are more likely to get treated, it's not random, but. So this is what we mean by marginally, so Y zero and Y one are not independent from A marginally, meaning not conditional on X, so it's not random in general, but within levels of X, we might have random treatment assignment. Imagine that X is the only variable like this, so that people who are older are more likely to get treated than people younger and that's the only variable that's taken into account. In that case, we could say that within levels of X, in other words, among people who are younger and among people who are older, treatment is effectively randomly assigned at that point, so, in that case, treatment assignment is ignorable given age. And so that's clearly an assumption, and one of the things that we will try to do is figure out what are these X variables that we need to collect to make the ignorability assumption hold? Next we'll move on to the positivity assumption, so positivity is referring to this idea that everybody had some chance of getting either treatment, and that's sort of conditional on these X's. So, at every level of X, and for every treatment, people had a nonzero probability, a greater than zero chance of getting treatment, so in other words, treatment is not deterministic as a function of X, for example. So in the previous example when we talked about older versus younger, it would be a violation of the positivity assumption if everybody who was older got treated, but it's not a violation of the positivity assumption if older people are just more likely to get treated, but everybody still could get treated. And hopefully the reason that we need this assumption is clear because, remember, we're going to need to have data where we can learn about what would happen under either treatment scenario. So if for a given value of X, everybody is treated, then there's really no way for us to learn what would've happened if they weren't treated. But as long as we have some people who are treated and some who aren't within every level of X, then there's some hope of learning about the sort of causal effects of treatment within levels of X. So we need this positivity assumption just so we can have some data at every level of X for people who are treated and not treated. So this is just reiterating, we just need for it not to be deterministic, if it is deterministic, there's some cases where people with certain diseases might be ineligible in a sense for a particular treatment. Well, in that case, we don't want to make inference about that population, so we would probably exclude them from the study and make sure that when we have our inference, our causal effect results, that we're not thinking of them as part of this population, right? So positivity assumption is also helping us sort of just find who our population of interest is, if there are people who could never get the treatment, then typically we would want to exclude them and only make inference about the population of people who have some chance of getting the treatment. And so, in general, though, of course, we need variability in treatment assignment if we're going to have an identification. So if everything is deterministic, then we're just not going to have the data that we'll need to identify causal effects. So that's the positivity assumption that at every level of X, everybody has some chance of getting either treatment. Next, we're going to move away from just defining assumptions to then linking observed data and potential outcomes, we'll use assumptions to link observed data and potential outcomes. So here's one expected value that only involves observed data, so here we have the expected value of Y, given A equal little a and X equal little x. So this is the expected value of Y among the subpopulation of people who have treatment equal to little a, and whose covariates are equal to little x. So this is, we observe capital Y, capital A and capital X, those are all observed data, there's no potential outcomes there, so we can use some of these causal assumptions then to link the observed data to potential outcomes. So we started with this thing, this expected value that only involves observed data, so that's on the left, but that is actually equal to the expected value of Y superscript A given A equal little a and X equal little x, by the consistency assumption. So if you remember, the consistency assumption said that the outcome that we observe when treatment is equal to little a is the same as the potential outcome, y superscript little a. So that's why we can, as long as the consistency assumption holds, then we have this equation here, we can just link these two. So you'll notice we already went from something only involving observed data to now something that involves potential outcomes, and we did it just from this consistency assumption. Next we can think about the ignorability assumption, and what the ignorability assumption allows us to do is to drop this conditioning on treatment. So, from the previous line to this line, well, all we did was drop the conditioning on A equal little a, what allows us to do that is the ignorability assumption. If you remember, the ignorability assumption said that conditional on X, conditional on these covariates, the treatment assignment mechanism doesn't matter, it's just random. So, in other words, conditioning on A isn't providing us any additional information about the mean of the potential outcome here, because as long as you condition on X, it's randomly assigned. So we're able to drop this conditioning on A here, so that's by the ignorability assumption. So now we've gone from our original statement, which was expectability of Y given A and X, to now something involving a potential outcome where we're only conditioning an X, and that's strictly from consistency and ignorability. And now if we want what's known as a marginal causal effect, which is a kind of causal effect we've talked about previously, something involving, let's say, a difference in potential outcomes where we don't condition on X, what we have to do then is we have to average over the distribution of X.