Methods: Econometrics 1
Lecture 1: Selection on observables
What is econometrics about?
Econometrics is about building alternate universes (i.e., counterfactuals), to unravel the
effects of a certain policy for example.
This econometrics 1 course is focused on the total effect of a specific treatment on the
outcome, not the channel through which this is caused (that is what we will discuss in
econometrics 2). Econometrics 2 disentangles the direct effect from the indirect effect
(mediators, etc.).
The focus of this course is also about research designs – something you must think about
before you start doing your analysis.
The research designs that we will discuss are the following:
- Selection of observables
- Randomized controlled trials
- Instrumental variable approach
- Regression discontinuity design
- Difference-in-differences
Selection on observables
A regression is only unbiased when treatment assignment is independent of potential
outcomes. All covariates must be included in the regression to ensure unbiasedness.
The problem with estimating a treatment effect is that there is no counterfactual observed.
There are 2 naïve ways of assessing this counterfactual (example is to look at the grow of a
plant using Pokon):
- Just comparing the plant at time t and time t+1, and then concluding that the plant
would have looked worse without the treatment.
o Problem: selection bias/ omitted variables bias: other things could have been
the reason for why the plant grew the way it did. It does not have to be Pokon.
This bias can lead to over- or underestimation of the real treatment effect.
Therefore, there is non-random selection into treatment (it can be the case that
you take more care of the plant after using Pokon or that you use it if your
plant is already almost dead) and this leads to the selection bias. To put it
differently, there is variable that is a common cause of both the treatment and
the outcome “confounder”.
o Non-random selection into treatment can be realized by subjects themselves or
by a policymaker/us.
o The selection bias can be resolved if all confounders are controlled for.
However, sometimes, there might be confounders that are unobservable.
- Cross-sectional comparison: compare 2 plants, one which gets Pokon, and the other
which doesn’t get Pokon. The problem with this is that the plants can be different in
other terms as well, and, therefore, the treatment effect may be caused by other factors
than just Pokon.
Potential outcomes
, An individual has 2 potential outcomes. The first one is the road that you take and the second
one is the road that you did not take. Basically, this means the outcome if you were assigned
to treatment vs. the outcome if you were assigned to control. The unit causal effect is the
difference between those 2 potential outcomes. The problem is that we only observe one of
them.
The average causal effect is the difference between the average potential outcomes if all units
were assigned to treatment and the average of potential outcomes if all units were assigned to
control.
To estimate the average causal effect, we need to randomly assign some units to treatment
and some units to control. The mean of the random sample from the population is then the
unbiased estimator for the mean of the population, thanks to randomization.
A research design should always entail why selection into treatment was indeed random. If
this is the case, the exchangeability assumption holds (we can fairly compare treatment and
control group).
Selection bias
Y(0,i) = outcome of subject i in the control group
Y(1,i) = outcome of subject i in the treatment
Unit causal effect = Y(1,i) – Y(0,i)
Difference in group means = average causal effect + selection bias
Causal diagrams
These visualize how units were selected into treatment (i.e., your identification problem).
The causal diagram shows a causal relationship between variables in a causal model.
Some rules:
- Nodes are the variables in your model (nodes earlier in time should be on the left of
the diagram).
- Arrows are the causal connections
- A dashed arrow signifies an unobserved cause
- A causal relationship goes only in one direction
- A path is a sequence of arrows connecting two nodes: arrows can go in either
direction
o A directed path has only arrows in the same direction
- Graphs are acyclic: no directed paths from variable to itself