0:09

In this lecture we are going to look at the gold standard of obtaining causal effects

Â - that is, the research design that - in theory - is guaranteed to yield the causal effect

Â of an independent variable. This gold standard is the randomized controlled

Â trial. And it works through what is known as randomization.

Â This - and how it relates to the linear regression model is the topic of this lecture.

Â First, we have to go through some tedious notation.

Â The independent variable is going to be very simple - it takes only two values - treated

Â and non-treated. Hence, in the case of randomized controlled

Â trials x is often denoted T for treatment. Therefore, either T is one, if the person

Â is treated or zero if not treated. Note that treatment can be all kind of things.

Â Medical treatment, a training program for unemployed or getting a college degree.

Â For each person in the dataset we observe an outcome Y.

Â This can be whatever kind of relevant outcome. Whether the person has been cured from and

Â illness, has found a job or labor market earnings. The outcome is either the outcome if treated,

Â when T is equal to one; or outcome if not treated, when T is equal to zero.

Â The treatment effect at the individual level is then the difference between the outcome

Â if treated and the outcome if not treated. Both are never observed at the same time for

Â the same person. And therefore, this notation is known as the

Â counterfactual approach. Either the outcome when treated is observed

Â or the outcome when not treated is observed, never both.

Â Hence, one of the outcomes is counterfactual. So we will never know the treatment effect

Â for a particular individual. But we may hope to learn the average treatment

Â effect - the average outcome for the treated minus

Â the average outcome for the non-treated. If we knew the counterfactual outcome this

Â would be straightforward to calculate. However, as we do not, we need that the treated

Â and the non-treated are on average comparable. We return to this issue.

Â But we define the average treatment effect - the ATE - as the average difference between

Â the observed and the counterfactual outcome for each person.

Â We further define two other types of outcomes. The average outcome when treated for those

Â who actually got the treatment. This would correspond to the average earnings

Â with a college degree for those who actually have a degree - as opposed to the counterfactual

Â earnings - the earnings with a degree for those who did not get one.

Â Finally, we also need the average outcome without treatment for those who are not treated.

Â That is, earnings without a degree for those who actually do not have one.

Â Note that both the average outcome with treatment for those treated and average outcome without

Â treatment for those not treated are both observed in the data.

Â With this new notation, we can now decompose the observed difference in outcome for those

Â treated and those not treated. First, we write the average outcome with a

Â treatment for those treated minus the average outcome without treatment for those not treated.

Â This is observable form data. Next, we introduce counterfactuals.

Â We add and subtract the average outcome if not-treated for those who were actually treated.

Â This is not observed so it is an unknown number. Nevertheless, as we both add it and subtract

Â it, this does not change the calculation. Then we rearrange, so that we first write

Â the difference between the average outcome with and without treatment for those actually

Â treated. This is the average treatment effect for those

Â who were treated - ATT. Next, we add the difference between the average

Â outcome for the treated and the untreated when neither receives treatment.

Â This would be the average difference in earnings for those with and without a college degree

Â if neither group had a degree. A baseline difference that occur if the two

Â groups are not comparable on background characteristics. Hence, the difference in outcome between the

Â treated and the not treated can be decomposed into a true treatment effect and a bias term

Â that arises because the treated and not treated are not comparable in the absence of treatment.

Â 4:32

How does the new notation relate to the notation of our simple linear regression model?

Â It turns out that there is a simple correspondence. We now write our independent variable t for

Â treatment. It is now a so-called dummy variable that

Â takes the value one when treated and zero when not treated.

Â Then we can write the average outcome for the treated with treatment as the constant

Â term plus the coefficient of the treatment variable and we can write the average outcome

Â for the non-treated as the constant term. The average difference between the two groups

Â is then just the regression coefficient. In addition, we know from the previous lecture

Â that unobserved confounders may bias it. Therefore, bias from confounders are the same

Â as the baseline bias we just described. This slide tries to elaborate on the duality

Â of baseline bias and bias from confounders. The average outcome for the treated when they

Â receive treatment is in the regression terminology the constant term plus the regression coefficient

Â plus the average value of the effect of unobserved variables (confounders) for the treated.

Â Likewise for the untreated, except that they do not get the effect from the regression

Â coefficient (the treatment effect). The average counter factual outcome for the

Â treated is the same as the observed outcome for the treated expect now they do not get

Â the effect from the regression coefficient. Combining terms we can once again write the

Â difference between the average outcome for the treated and the untreated, the average

Â treatment effect of the treated plus bias in terms of the regression coefficients.

Â Eventually this becomes the causal effect, b, plus differences in the effects of the

Â confounders - i.e. the differences in the error terms for the treated and the non-treated.

Â This last difference is zero if the effect from unobserved variables is the same among

Â the treated and the non-treated or put differently, that is, if the error term is independent

Â of the treatment status. In most cases, it would be unwarranted to

Â assume that the error term is independent from the treatment status.

Â It is easy to think of variables that affect both the event of getting a college degree

Â and earnings. For instance, IQ and conscientiousness are both important in terms of educational

Â achievement and in getting a good job with a high wage.

Â Therefore, if both these variables are unobserved, it is very likely that the observed differences

Â between those with and without a college degree reflect both a potential causal effect of

Â a college degree but also confounding or baseline differences in IQ and conscientiousness.

Â A convenient way out of this pickle is randomization. That is, we make a lottery that determines

Â whether people are allowed to get a college degree.

Â It is not considered ethical to make such a lottery but for the sake of our case, we

Â nevertheless imagine that we can make this lottery.

Â Thanks to the lottery, the average outcome for those treated and those not treated had

Â they been treated, are now the same. The first is observed, the latter is counterfactual,

Â but we know they are the same thanks to the lottery.

Â The same goes for the outcome when not treated - the average outcome is the same for both

Â those treated and not-treated because on average the two groups are comparable because they

Â were allocated into treatment by the lottery. Therefore, when we calculate the difference

Â between the average outcome for those treated and those not treated we get both the average

Â treatment effect for the treated and the untreated with no baseline bias.

Â Until now we have assumed that the treatment effect is uniform - everybody get the same

Â causal effect of the treatment, b. This is not realistic.

Â However, it turns out, that as long as we are only concerned about average treatment

Â effect the assumption of homogenous treatment effect has limited importance.

Â However, we will make a slight digression to show you how heterogeneous treatment effects

Â works. Consider again the simple linear regression

Â model with the independent treatment dummy variable.

Â Now the regression coefficient has a subscript l indicating potentially individual treatment

Â effects. That is both the size and the sign of the

Â treatment effect vary across individuals. We also allow for the possibility that the

Â effect and the error is correlated, that is, those who are treated might experience different

Â treatment effects compared to those who are not treated.

Â In the health sciences, this makes very much sense.

Â You do not treat healthy people because you do not expect them to be (positively) affected

Â by the treatment. You do not take aspirins if you do not have

Â a headache. In addition, as before, we allow for confounders

Â in the sense that the error term might be correlated with treatment status.

Â In order to proceed we decompose the individual treatment effect into a common effect and

Â an individual part with mean zero. This is innocuous as the decomposition is

Â tautological. We now study what happens when we calculate

Â the difference between the treated and the non-treated using data from a randomized trial

Â and when the data is generated from a model with heterogeneous treatment effects.

Â Again, treated and controls are on average equal in terms of the effect of the unobserved

Â cofounders due to randomization. Therefore, the average treatment effect is

Â the common treatment effect plus the average individual treatment effect plus the difference

Â in baseline. The latter is zero by randomization and the

Â average individual treatment effect is zero by assumption.

Â Therefore, the difference between the average outcome of the treated and the non-treated

Â from a randomized control trial is the average treatment effect.

Â Randomization therefore guarantees to yield an estimate of the average causal treatment

Â effect even if treatment effects are heterogonous and even if individuals would select themselves

Â into treatment on the basis of treatment size in the absence of randomization.

Â Therefore, randomization is a very powerful tool when trying to estimate causal effect.

Â Even though randomized trials are such a powerful tool when trying to establish causal relations

Â empirically, they are not always easy or feasible to carry out in practice.

Â Therefore, there is not an abundance of randomized controlled trials in the social sciences.

Â One notable example though is the project STAR.

Â This project randomized more than 11,000 students into two treatments and one control group.

Â When entering school in kindergarten students were randomized into either a small class

Â (less than 17 students) and an ordinary class (more than 22 students) or an ordinary class

Â with a teacher's aide. Student where followed throughout school and

Â into high school and achievement scores were recorded at the end of each school year.

Â Here we show the math achievement score at the end of kindergarten for all students.

Â In addition, we show the distribution of students across treatment arms.

Â Approximately a third of the students are in each of the three groups.

Â The reason that it is not exactly third in each group is that even though students were

Â randomized into the three groups some student subsequently decided to leave the class to

Â which they were allocated by the randomization procedure.

Â Such non-random dropout threatens the final data set to be complexly randomized.

Â We return to this issue later. We now run a regression on the math achievement

Â using dummy variables for whether the student is in an ordinary class (control group) or

Â an ordinary class with a teacher's aide. The reference group is then students in a

Â small class. The top regression is using math achievement

Â after kindergarten as outcome. Here we find that student in the control group

Â and student with a teacher's aide scores approximately 10 point lower on the math scale compared

Â to students in a small classes. Assuming that randomization was successful,

Â this is an average causal effect. Despite that student may experience very different

Â outcomes when taught in a small compared to ordinary class, ON AVERAGE student benefits

Â from small classes. Running the same regression using math achievement

Â after 1st grade as outcome yields a similar result.

Â To interpret this result some bookkeeping is needed, because students can now potentially

Â switch between treatment groups, and thus may only be 'partially' treated.

Â To avoid this, the regression using math in 1st grade as the outcome uses only students

Â who has attended the same class type in both kindergarten and 1st grade.

Â Obviously, if randomization should neutralize any baseline differences individuals may not

Â circumvent the randomization procedure and 'self select' into different treatment groups

Â reinstating a correlation between the error term and the treatment indicator.

Â In practice it is unavoidable that a few individuals for some reason escape the randomization procedure.

Â To assess how important this is in practice one can compare the treatment groups based

Â on observable characteristics. If data is balanced based on observables it

Â is credible that is it also balanced on unobservables as well, indicating that randomization is

Â successful despite some reallocation of individuals after randomization.

Â To asses this in our example we show the distribution of whether the student is eligible to free

Â lunch (due to low-income parents), ethnicity and gender.

Â In no cases can we reject to null hypothesis of independence between background characteristics

Â and allocation to treatment groups. Thus, it seems that randomization in project

Â STAR was indeed successful. Another way to see that randomization was

Â properly successful is to look at the estimated effect of class type with and without using

Â background characteristics as controls. We hypothesize that background characteristics

Â are potential confounders and that we therefore should find a different effect of class type

Â when we control for confounders. However, as we see from the two regressions

Â reported in the tables, there are practically no differences between the estimated effects

Â of class type across the two regression models, again indicating that randomization was successful.

Â