SAMENVATTING
DEVELOPMENT ECONOMICS
MODULE 1: EMPIRICAL DEVELOPMENT ECONOMICS
INTRODUCTION
Central questions in development economics are:
- Why are some countries rich and others poor?
- What is the return on investment in human capital?
- Does smaller class sizes increase educational attainment?
To answer these questions, causal inference is essential. This
includes the distinction between correlation and causation.
→ An example to explain this difference is measuring the effect of a master's degree on future income. Simple
correlations cannot establish causal relationships.
Regression analysis is often used but contains potential biases, such as:
- Simultaneity or reverse causality → 𝑦 = 𝛼 + 𝛽𝑥 + 𝜖
- Omitted variable bias: important variables are omitted, which distorts the model → 𝑦 = 𝛼 + 𝛽𝑥 + 𝛿𝑧 + 𝜖
- Selection bias: when the sample is not representative
RANDOMIZED CONTROLLED TRAILS (RCTs)
RCTs are considered the "gold standard" for causal inference. In these, participants are randomly assigned to a
treatment group or a control group.
Important concepts in RCTs:
- Sample size: a large sample is essential for reliability
- Potential problems: dropout/attrition, contamination and compliance
𝑌 = 𝛽1 + 𝛽2𝑋 + 𝛽3𝑇𝑅𝐸𝐴𝑇 + 𝜀
RCTs focus on the average effect, but they have limitations, such as:
- They reveal little about underlying mechanisms
- Scalability can be a challenge
- They are not always ethically or practically feasible
1
, NATURAL EXPERIMENTS
Natural experiments (such as Regression Discontinuity Design, RDD) offer an alternative when an RCT is not possible.
Here, a random or arbitrary allocation process is used, such as a program based on dates of birth.
Important concepts in RDDs:
- RDD requires a clear dividing line in allocation and a
sufficiently large sample.
- Potential problems: dropout/attrition, contamination and
compliance
𝑌 = 𝛽1 + 𝛽2𝑋𝑖 + 𝛽2𝐴𝑖 + 𝛽3𝑇𝑅𝐸𝐴𝑇 + 𝜀𝑖
- 𝑇𝑅𝐸𝐴𝑇 = (𝐴𝑖 ≥ 𝐴0)
RDDs focus on the average effect, but they have limitations, such as:
- They reveal little about underlying mechanisms
- Does not tell us whether the same program would be effective in another context
- Possibility to measure equilibrium effects depends on the scale of the program
DIFFERENCE-IN-DIFFERENCES (DIF-IN-DIF)
This method compares the change in outcome for the treated group with a
control group before and after an intervention.
Important concepts in RCTs:
- For both groups to be comparable on average you need a big enough
population and sample size
- Potential problems: dropout/attrition, contamination and compliance
𝑌 = 𝛽1 + 𝛽2𝑋 + 𝛽3𝑇 + 𝛽4𝑆 + 𝛽5(𝑇 ∗ 𝑆) + 𝜀
- 𝑇= a dummy for the time period (0 before the treatment; 1 after the treatment)
- 𝑆= a dummy for the group (1 if treated; 0 if not treated)
Dif-in-Difs focus on the average effect, but they have limitations, such as:
- They reveal little about underlying mechanisms
- Does not tell us whether the same program would be effective in another context
- Possibility to measure equilibrium effects depends on the scale of the program
2
, INSTRUMENTAL VARIABLE (IV)
When there is endogeneity (a variable is correlated with the error term), instrumental variables can be used to obtain
appropriate estimates.
The two-step least squares (2SLS) method is applied here, using an instrument to estimate the variable of interest.
- Imagine you want to estimate the following model:
o 𝑌 = 𝛽1 + 𝛽2𝑋 + 𝛽3𝐸 + 𝜀
o 𝐸 is endogenous → 𝐸 and 𝜀 are correlated → OLS estimation is biased
- First stage: estimate the effect of the instrument(s) and other variables on the endogenous variable
o 𝐸 = 𝛽1 + 𝛽2𝑋 + 𝛽3𝑍 + 𝜀
o Obtain 𝐸
- Second stage: estimate the model using the estimated endogenous variable from the first stage
o 𝑌 = 𝛽1 + 𝛽2𝑋 + 𝛽3𝐸 + 𝜀
Finding a suitable instrumental variable is often difficult in practice.
MATCHING
Sometimes no direct control group or instrument is available. In that case, a similar
control group is constructed via matching on observables.
The most commonly used method is propensity score matching:
- For each observation in the dataset, you calculate the probability of being treated based on the observable
characteristics
- Instead of looking for a match with the same characteristics for each treated individual, you look for a match
with the same likelihood to be treated (propensity score)
However, this process also has its limitations
- Need a lot of control variables to match people on (or calculate the propensity score)
- When unobservables influence the propensity to be treated and the outcome → omitted variable bias
- Practical: you need to choose a tolerance limit. How close or far can the matched pairs be?
Important concepts in matching:
- For both groups to be comparable on average you need a big enough population and sample size
- Potential problems: dropout/attrition, contamination and compliance
𝑌 = 𝛽1 + 𝛽2𝑋 + 𝛽3𝑇𝑅𝐸𝐴𝑇 + 𝜀
Matching focus on the average effect, but they have limitations, such as:
- They reveal little about underlying mechanisms
- Does not tell us whether the same program would be effective in another context
- Possibility to measure equilibrium effects depends on the scale of the program
3
, PANEL DATA: FIXED EFFECTS ESTIMATOR
This model uses data collected repeatedly on the same units (e.g. people or countries) over a period of time.
𝑌𝑖,t = 𝛽1𝑋𝑖,𝑡 + 𝛽2𝑇𝑅𝐸𝐴𝑇𝑖,t + 𝛼𝑖 + 𝜀𝑖,t
This model cannot resolve reverse causality and offers limited opportunities to address time-variant-omitted variables.
Panel data focus on the average effect, but they have limitations, such as:
- They reveal little about underlying mechanisms
- Does not tell us whether the same program would be effective in another context
- Possibility to measure equilibrium effects depends on the scale of the program
HIERARCHY OF EVIDENCE
The evidence hierarchy refers to an arrangement of different types of evidence, from least to most reliable, that is
used to establish causal relationships in empirical research. In development economics, it is important to understand
how robust the substantiation of certain findings is
- Anecdotal evidence: the lowest level of evidence and is often used to generate hypotheses, but it is not
sufficient to draw strong conclusions.
- Pre- and post-test studies: offers better insights than anecdotal evidence, but is still limited because other
influences cannot be ruled out.
- Single case designs: offer depth, but lack the ability to draw general conclusions applicable to broader
populations.
- Quasi experimental studies: (Dif-in-Dif, RDD) provide more robust evidence than pre-and post-test studies
because they have a control group, but without random assignment there remains risk of bias.
- Randomized controlled studies: RCTs are the golden standard for establishing causal relationships, but they
are not always applicable in all situations.
- Systematic reviews: highest level of evidence by combining multiple RCTs or studies, leading to generalisable
conclusions.
The higher you get in this hierarchy, the stronger and more reliable the evidence becomes for demonstrating causal
links.
4