And the difference between the two groups is just the interaction term.

Therefore, the interaction term captures the difference in time across the two

groups, or the differences in differences.

It seems as a difference, the difference model is a magic tool.

It removes all selection effects

and therefore allows interpretation of the interaction term as

the average causal effect of the treatment, without the need for

randomized data or instrumental variables.

All we need is average outcomes for individuals in the treatment group and

control group before and after the treatment.

We do not even need data for the same persons across time.

Therefore, the data for the treatment and

control groups does not need to be the same across periods.

This is neat, if for

instance, you want to study the effect of a smoking policy in one state or region,

all you need is health outcomes in the region and

a control group in a neighbouring region, for instance

before and after the implementation of this modern policy.

Any difference and difference across the two regions are interpreted as a causal

effect of this smoking policy.