0:09

In this lecture we are going to look at the gold standard of obtaining causal effects

- that is, the research design that - in theory - is guaranteed to yield the causal effect

of an independent variable. This gold standard is the randomized controlled

trial. And it works through what is known as randomization.

This - and how it relates to the linear regression model is the topic of this lecture.

First, we have to go through some tedious notation.

The independent variable is going to be very simple - it takes only two values - treated

and non-treated. Hence, in the case of randomized controlled

trials x is often denoted T for treatment. Therefore, either T is one, if the person

is treated or zero if not treated. Note that treatment can be all kind of things.

Medical treatment, a training program for unemployed or getting a college degree.

For each person in the dataset we observe an outcome Y.

This can be whatever kind of relevant outcome. Whether the person has been cured from and

illness, has found a job or labor market earnings. The outcome is either the outcome if treated,

when T is equal to one; or outcome if not treated, when T is equal to zero.

The treatment effect at the individual level is then the difference between the outcome

if treated and the outcome if not treated. Both are never observed at the same time for

the same person. And therefore, this notation is known as the

counterfactual approach. Either the outcome when treated is observed

or the outcome when not treated is observed, never both.

Hence, one of the outcomes is counterfactual. So we will never know the treatment effect

for a particular individual. But we may hope to learn the average treatment

effect - the average outcome for the treated minus

the average outcome for the non-treated. If we knew the counterfactual outcome this

would be straightforward to calculate. However, as we do not, we need that the treated

and the non-treated are on average comparable. We return to this issue.

But we define the average treatment effect - the ATE - as the average difference between

the observed and the counterfactual outcome for each person.

We further define two other types of outcomes. The average outcome when treated for those

who actually got the treatment. This would correspond to the average earnings

with a college degree for those who actually have a degree - as opposed to the counterfactual

earnings - the earnings with a degree for those who did not get one.

Finally, we also need the average outcome without treatment for those who are not treated.

That is, earnings without a degree for those who actually do not have one.

Note that both the average outcome with treatment for those treated and average outcome without

treatment for those not treated are both observed in the data.

With this new notation, we can now decompose the observed difference in outcome for those

treated and those not treated. First, we write the average outcome with a

treatment for those treated minus the average outcome without treatment for those not treated.

This is observable form data. Next, we introduce counterfactuals.

We add and subtract the average outcome if not-treated for those who were actually treated.

This is not observed so it is an unknown number. Nevertheless, as we both add it and subtract

it, this does not change the calculation. Then we rearrange, so that we first write

the difference between the average outcome with and without treatment for those actually

treated. This is the average treatment effect for those

who were treated - ATT. Next, we add the difference between the average

outcome for the treated and the untreated when neither receives treatment.

This would be the average difference in earnings for those with and without a college degree

if neither group had a degree. A baseline difference that occur if the two

groups are not comparable on background characteristics. Hence, the difference in outcome between the

treated and the not treated can be decomposed into a true treatment effect and a bias term

that arises because the treated and not treated are not comparable in the absence of treatment.

4:32

How does the new notation relate to the notation of our simple linear regression model?

It turns out that there is a simple correspondence. We now write our independent variable t for

treatment. It is now a so-called dummy variable that

takes the value one when treated and zero when not treated.

Then we can write the average outcome for the treated with treatment as the constant

term plus the coefficient of the treatment variable and we can write the average outcome

for the non-treated as the constant term. The average difference between the two groups

is then just the regression coefficient. In addition, we know from the previous lecture

that unobserved confounders may bias it. Therefore, bias from confounders are the same

as the baseline bias we just described. This slide tries to elaborate on the duality

of baseline bias and bias from confounders. The average outcome for the treated when they

receive treatment is in the regression terminology the constant term plus the regression coefficient

plus the average value of the effect of unobserved variables (confounders) for the treated.

Likewise for the untreated, except that they do not get the effect from the regression

coefficient (the treatment effect). The average counter factual outcome for the

treated is the same as the observed outcome for the treated expect now they do not get

the effect from the regression coefficient. Combining terms we can once again write the

difference between the average outcome for the treated and the untreated, the average

treatment effect of the treated plus bias in terms of the regression coefficients.

Eventually this becomes the causal effect, b, plus differences in the effects of the

confounders - i.e. the differences in the error terms for the treated and the non-treated.

This last difference is zero if the effect from unobserved variables is the same among

the treated and the non-treated or put differently, that is, if the error term is independent

of the treatment status. In most cases, it would be unwarranted to

assume that the error term is independent from the treatment status.

It is easy to think of variables that affect both the event of getting a college degree

and earnings. For instance, IQ and conscientiousness are both important in terms of educational

achievement and in getting a good job with a high wage.

Therefore, if both these variables are unobserved, it is very likely that the observed differences

between those with and without a college degree reflect both a potential causal effect of

a college degree but also confounding or baseline differences in IQ and conscientiousness.

A convenient way out of this pickle is randomization. That is, we make a lottery that determines

whether people are allowed to get a college degree.

It is not considered ethical to make such a lottery but for the sake of our case, we

nevertheless imagine that we can make this lottery.

Thanks to the lottery, the average outcome for those treated and those not treated had

they been treated, are now the same. The first is observed, the latter is counterfactual,

but we know they are the same thanks to the lottery.

The same goes for the outcome when not treated - the average outcome is the same for both

those treated and not-treated because on average the two groups are comparable because they

were allocated into treatment by the lottery. Therefore, when we calculate the difference

between the average outcome for those treated and those not treated we get both the average

treatment effect for the treated and the untreated with no baseline bias.

Until now we have assumed that the treatment effect is uniform - everybody get the same

causal effect of the treatment, b. This is not realistic.

However, it turns out, that as long as we are only concerned about average treatment

effect the assumption of homogenous treatment effect has limited importance.

However, we will make a slight digression to show you how heterogeneous treatment effects

works. Consider again the simple linear regression

model with the independent treatment dummy variable.

Now the regression coefficient has a subscript l indicating potentially individual treatment

effects. That is both the size and the sign of the

treatment effect vary across individuals. We also allow for the possibility that the

effect and the error is correlated, that is, those who are treated might experience different

treatment effects compared to those who are not treated.

In the health sciences, this makes very much sense.

You do not treat healthy people because you do not expect them to be (positively) affected

by the treatment. You do not take aspirins if you do not have

a headache. In addition, as before, we allow for confounders

in the sense that the error term might be correlated with treatment status.

In order to proceed we decompose the individual treatment effect into a common effect and

an individual part with mean zero. This is innocuous as the decomposition is

tautological. We now study what happens when we calculate

the difference between the treated and the non-treated using data from a randomized trial

and when the data is generated from a model with heterogeneous treatment effects.

Again, treated and controls are on average equal in terms of the effect of the unobserved

cofounders due to randomization. Therefore, the average treatment effect is

the common treatment effect plus the average individual treatment effect plus the difference

in baseline. The latter is zero by randomization and the

average individual treatment effect is zero by assumption.

Therefore, the difference between the average outcome of the treated and the non-treated

from a randomized control trial is the average treatment effect.

Randomization therefore guarantees to yield an estimate of the average causal treatment

effect even if treatment effects are heterogonous and even if individuals would select themselves

into treatment on the basis of treatment size in the absence of randomization.

Therefore, randomization is a very powerful tool when trying to estimate causal effect.

Even though randomized trials are such a powerful tool when trying to establish causal relations

empirically, they are not always easy or feasible to carry out in practice.

Therefore, there is not an abundance of randomized controlled trials in the social sciences.

One notable example though is the project STAR.

This project randomized more than 11,000 students into two treatments and one control group.

When entering school in kindergarten students were randomized into either a small class

(less than 17 students) and an ordinary class (more than 22 students) or an ordinary class

with a teacher's aide. Student where followed throughout school and

into high school and achievement scores were recorded at the end of each school year.

Here we show the math achievement score at the end of kindergarten for all students.

In addition, we show the distribution of students across treatment arms.

Approximately a third of the students are in each of the three groups.

The reason that it is not exactly third in each group is that even though students were

randomized into the three groups some student subsequently decided to leave the class to

which they were allocated by the randomization procedure.

Such non-random dropout threatens the final data set to be complexly randomized.

We return to this issue later. We now run a regression on the math achievement

using dummy variables for whether the student is in an ordinary class (control group) or

an ordinary class with a teacher's aide. The reference group is then students in a

small class. The top regression is using math achievement

after kindergarten as outcome. Here we find that student in the control group

and student with a teacher's aide scores approximately 10 point lower on the math scale compared

to students in a small classes. Assuming that randomization was successful,

this is an average causal effect. Despite that student may experience very different

outcomes when taught in a small compared to ordinary class, ON AVERAGE student benefits

from small classes. Running the same regression using math achievement

after 1st grade as outcome yields a similar result.

To interpret this result some bookkeeping is needed, because students can now potentially

switch between treatment groups, and thus may only be 'partially' treated.

To avoid this, the regression using math in 1st grade as the outcome uses only students

who has attended the same class type in both kindergarten and 1st grade.

Obviously, if randomization should neutralize any baseline differences individuals may not

circumvent the randomization procedure and 'self select' into different treatment groups

reinstating a correlation between the error term and the treatment indicator.

In practice it is unavoidable that a few individuals for some reason escape the randomization procedure.

To assess how important this is in practice one can compare the treatment groups based

on observable characteristics. If data is balanced based on observables it

is credible that is it also balanced on unobservables as well, indicating that randomization is

successful despite some reallocation of individuals after randomization.

To asses this in our example we show the distribution of whether the student is eligible to free

lunch (due to low-income parents), ethnicity and gender.

In no cases can we reject to null hypothesis of independence between background characteristics

and allocation to treatment groups. Thus, it seems that randomization in project

STAR was indeed successful. Another way to see that randomization was

properly successful is to look at the estimated effect of class type with and without using

background characteristics as controls. We hypothesize that background characteristics

are potential confounders and that we therefore should find a different effect of class type

when we control for confounders. However, as we see from the two regressions

reported in the tables, there are practically no differences between the estimated effects

of class type across the two regression models, again indicating that randomization was successful.