ASSIGNMENT 2: Solutions
Theoretical Exercises
Solution Theoretical Exercise 1 (each 10 points)
a) Since yi can take a value of either 0 or 1, εi = hyi − x0i β can take i a value of either 0 − x0i β
2 2
or 1 − x0i β. Then, V ar [εi | xi ] = E ε2i | xi = E (yi − x0i β) | xi = (1 − x0i β) P (yi = 1 | xi ) +
2 2 2
(−x0i β) P (yi = 0 | xi ) = (1 − x0i β) (x0i β) + (x0i β) (1 − x0i β) = x0i β (1 − x0i β), using the fact that
P [yi = 1 | xi ] = x0i β. This shows that in the LPM the unobserved error is heteroskedastic by construc-
tion.
b) The transformed model is P y = P Xβ + P ε, where P is a diagonal matrix with elements
1/ x0i β (1 − x0i β) on the diagonal.
p
c) The GLS estimator is the OLS estimator on the transformed model, and it is given by β̂GLS =
0 −1 0
(P X) P X (P X) P y. Since P depends on the unknown β, the GLS estimator cannot be used.
d) x0i β can be estimated by x0i β̂ which is just the prediction for observation i in the regression
P [yi = 1 | xi ] = x0i β. The
r transformed model is then P̂ y = P̂ Xβ + P̂ ε, where P̂ is a diagonal
matrix with elements 1/ x0i β̂ 1 − x0i β̂ on the diagonal. The FGLS estimator is then given by
0 −1 0
bF GLS = P̂ X P̂ X P̂ X P̂ y.
Solution Theoretical Exercise 2 (3,3 and 4 points)
• The OLS estimator is still consistent under heteroskedasticity (see the proof for the consistency, where
we did not rely on any arguments related to heteroskedasticity)
• The usual F statistic no longer has a F distribution. This follows since for inference in small samples
(i.e. t or F test), we required the assumption that ε ∼ N (0, σ 2 I) to derive the distributions of the test
statistics.
• The OLS estimator is still unbiased, but no longer the best linear estimator. This follows because under
heteroskedasticity, the GLS estimator is more efficient.
Solution Empirical Exercise (each 10 points)
Solution a)
rm(list=ls())
# Load data, define dep. variable and system. comp.
load("data_Assignment2.RData")
y = lwage
X = cbind(const=rep(1,length(y)),age,age2,black,educ)
1
, # OLS estimates
source("FunctionLSS_robust.R")
OLSres = FunctionLSS_robust(y,X)
OLSres$B_hat
## [,1]
## const 3.315457284
## age 0.132650710
## age2 -0.001552282
## black -0.212722143
## educ 0.038511816
Interpretation: One additional year of education causes the wage rate to increase by approximately 3.85
percent on average, controlling for the other factors considered in the regression.
Solution b)
# Define instrument matrix
Z = cbind(const=rep(1,length(y)),age,age2,black,fatheduc)
# Load IV function (calculations are done in function file)
source("FunctionLSS_IV.R")
# Get IV estimates
IVres = FunctionLSS_IV(y,X,Z)
IVres$B_hat_st
## [,1]
## const 3.348734773
## age 0.112647699
## age2 -0.001212481
## black -0.187411397
## educ 0.057084292
Interpretation: One additional year of education causes the wage rate to increase by approximately 5.71
percent on average, controlling for the factors considered in the regression, when education is proxied by
father education. Compared to the OLS estimation in part a), the IV estimation leads to a coefficient estimate
that is substantially larger. This suggests that the OLS estimator produces a biased estimate of the effect of
education, assuming that father education is a valid instrument.
Solution c)
# Test relevance (OLS of educ on instruments)
y = educ
X = cbind(const=rep(1,length(y)),age,age2,black,fatheduc)
2