Term 1
Week 1
Simple linear regression
Conditional distributions
• E[ay+b]=aE[y]+b
• E[a(x)y+b(x) |x = a(x) E[y|x] + b(x)
• Var (y|x)= E[y²|x]- E[y|x]²
• LIE: E[y]=E[E[y|x]]
Causality, ceteris paribus, and counterfactual reasoning
• If we hold factors other than x that a ect y constant, ie. ceteris paribus, then we can conclude
that x has a causal e ect on y
• Counterfactual reasoning means considering counterfactual outcomes, the outcome that would
have occurred in one state vs the actual outcome
De nitions
y= β₀ + β₁x + u
• So, if ceteris paribus holds Δu=0 and Δy=β₁Δx
• u contains all factors a ecting y other than x
• SLR.1 Linear in parameters, y= β₀ + β₁x + u
• SLR.2 Random sampling, {xᵢ,yᵢ}, i=1,…,n are independently and identically distributed
• SLR.3 xᵢ, i=1,…,n, exhibition variations that SSTₓ= ∑(xᵢ-x̄ )²>0
• SLR.4 Zero conditional mean, E[u|x]=0
• No x conveys any information about u on average due to mean independence E[u|x]=E[u]
• Implies E[u]=0
• E[u]=LIE E[E[u|x]]=SLR.4 E[0]=0
• Implies E[ux]=0
• E[ux]=LIE E[E[ux|x]]=E[xE[u|x]]=SLR.4 E[0x]=0
• SLR.5 Homoscedasticity, Var[u|x]=σ² for all values x
• This means that any unobserved explanatory variables are uncorrelated with x
Deriving OLS
yᵢ=β₀+β₁xᵢ+uᵢ
These methods only require SLR.3 to hold
Method of moments:
Population moments:
(1a)E[u]=E[y-β₀-β₁x]=SLR.4=0
(2a)E[xu]=E[x(y-β₀-β₁x])=SLR.4=0
Sample moments:
(1) n⁻¹ ∑[yᵢ-β̂₀-β̂₁xᵢ]=SLR.4=0
• This gives 1/n [∑yᵢ-∑β̂₀-∑β̂₁xᵢ]= ȳ - β̂₀-β̂₁x̄ = 0
(2) n⁻¹∑[xᵢ(yᵢ-β̂₀-β̂₁xᵢ)]=SLR.4=0
• Substituting in gives β̂₁= n⁻¹∑(yᵢ-ȳ)xᵢ/n⁻¹(xᵢ-x̄ )xᵢ = n⁻¹∑(yᵢ-ȳ)(xᵢ-x̄ )/n⁻¹∑(xᵢ-x̄ )²
• This is because n⁻¹∑(yᵢ-ȳ)x̄ =0 and n⁻¹∑(xᵢ -x̄ )x̄ =0
β̂₁= n⁻¹∑(yᵢ-ȳ)(xᵢ-x̄ )/n⁻¹∑(xᵢ-x̄ )² = ρ^xy(σ̂ y/ σ̂ x)
• ρ^xy= cov(x,y)/σ̂ yσ̂ x = (n-1)⁻¹∑(yᵢ-ȳ)(xᵢ-x̄ ) sample correlation between xᵢ, yᵢ
• σ̂ = (n-1)⁻¹ ∑(xᵢ-x̄ )² = sample standard deviations
fi ff ff ff
,Outcome:
Predicted/ tted values: ᵢ=β̂₀+β̂₁xᵢ, i=1,..,n
OLS regression line: =β̂₀+β̂₁xᵢ
Residuals: ᵢ=yᵢ- ᵢ ie. the di erence between the actual yᵢ and its tted value
Sum of squared residuals:
Under this method we seek to minimise ∑( ᵢ²) = ∑(yᵢ-β̂₀-β̂₁xᵢ)²
This yields FOC with respect to :
β̂₀: 0=-2 ∑(yᵢ-β̂₀-β̂₁xᵢ)
β̂₁: 0=-2 ∑(yᵢ-β̂₀-β̂₁xᵢ)xᵢ
Unbiasedness
OLS is unbiased under SLR.1-4
β̂₁= (a) ∑yᵢ(xᵢ-x̄ )/∑(xᵢ-x̄ )² = (b) ∑yᵢ(xᵢ-x̄ )/SSTx
(a) As ∑(xᵢ -x̄ )ȳ=0
(b) SSTx= ∑(xᵢ -x̄ )²
= SLR.1 ∑(β₀+β₁xᵢ+uᵢ)(xᵢ-x̄ )/SSTx
Working with the numerator:
∑(β₀+β₁xᵢ+uᵢ)(xᵢ-x̄ ) = β₀∑(xᵢ-x̄ ) + β₁∑(xᵢ-x̄ )xᵢ + ∑(xᵢ-x̄ )uᵢ = 0 + β₁∑(xᵢ-x̄ )xᵢ + ∑(xᵢ-x̄ )uᵢ = β₁SSTx +∑(xᵢ-x̄ )uᵢ
As ∑(xᵢ-x̄ )=0, β₀∑(xᵢ-x̄ )=0
And ∑(xᵢ-x̄ )xᵢ = ∑(xᵢ-x̄ )²
Overall:
β̂₁=β₁ +∑(xᵢ-x̄ )uᵢ/SSTx
∑(xᵢ-x̄ )uᵢ/SSTx= sampling error, the coe cient of regressing uᵢ on xᵢ
=β₁ +∑wᵢuᵢ
wᵢ= (xᵢ-x̄ )/SSTx
Conditional expectation:
E[β̂₁|X]=E[β₁ +∑wᵢuᵢ|X] = β₁ + ∑E[wᵢuᵢ|X] = β₁ + ∑wᵢE[uᵢ|X] = β₁
E[wᵢuᵢ|X]=wᵢE[uᵢ|X]=0 as E[u|X]=SLR.2E[u|x]=SLR.40, so wᵢ depends on X alone
LIE:
E[β̂₁]=E[E[β̂₁|X]]= E[β₁]=β₁
Variance of estimators
Using:
β̂₁=β₁ +∑wᵢuᵢ
That β₁ is constant so does not a ect Var(β̂₁|X)
Cov(uᵢ,uj|X)=SLR.2 and 4E[uᵢuj|xᵢ,xⱼ]=0
Var(uᵢ,|X)=SLR.2 and 5Var[uᵢ|xᵢ]= σ²
Var(β̂₁|X)=Var(β₁ +∑wᵢuᵢ|X)= ∑Var(wᵢuᵢ|X)= ∑wᵢ²Var(uᵢ|X) = ∑wᵢ²σ² = σ²∑wᵢ²
∑wᵢ²= SSTₓ/SSTₓ² =1/SSTₓ
Var(β̂₁)=σ²/SSTₓ
Variance of u
σ²= E[u²]
σ̂ ²= n⁻¹ ∑uᵢ²
As uᵢ² is unobservable we use n⁻¹ ∑uᵢ²= 1/(n-2) SSR
Standard error(β̂₁)= σ̂ /√SSTx
fi
û ŷ ŷ ŷ ff ff ffi û fi
, Variance of
Var = Var (β̂₀+β̂₁y)=Var(u)²+β̂₁²Var(Y)
Goodness of t
SST=SS= ∑(yᵢ-ȳ)² = total variation in y
SSE= ∑( ᵢ-ȳ)² = variation in y explained by x
SSR= ∑ ᵢ² = unexplained variation in y
SST=SSE+SSR
R²= SSE/SST= 1-SSR/SST = fraction of sample variation in y that is explained by x
Or, = SST total/ SST xᵢ
A larger R shows a better t of OLS line
Note- (n-1)⁻¹ SST= σ ²^y
Note- causality is about whether explanatory variables ⫫u, which R² says nothing about, it only
relates to the t of the model
Interpretation of coe cients
ŷ ûŷ fi ŷ fi fi ffi
, Week 2
Multiple linear regression
De nitions
ᵢ=β̂₀+β̂₁xᵢ₁+β̂₂xᵢ₂
yᵢ= β̂₀+β̂₁xᵢ₁+β̂₂xᵢ₂ + ᵢ
• β^s measure the ceteris-paribus change in y given a one unit change in xᵢ
• This interpretation is only valid if MLR.4 holds
Gauss-Markov assumptions:
• MLR.1 Linear in parameters
• MLR.2 Random sample
• MLR.3 No perfect collinearity xⱼ≠x and there is no exact linear relationship among any xⱼ in the
population
• ie. none of the explanatory variables are constant, and there are no exact linear relationships
among ⫫ variables
• MLR.4 Mean independence, E[u|x₁, x₂..xk]=0
• ie. other factors a ecting y are not related on average to x₁ and x₂
• If this holds then explanatory variables are exogenous
• This implies E[u]=0 and E[xⱼu]=0
• MLR.5 Homescedasticity, the variance of u is constant and ⫫x, ie. E[u² |x₁, x₂..xk]= σ²
• MLR.6 Normality, the population error u is ⫫ of X and normally distributed u~N(0, σ²)
• This necessarily assumes MLR.4 and 5 so is a very strong assumption
• The argument of it is the CLT, however its key weakness is that it assumes that all
unobservable factors a ect y in a separate, additive way
Deriving OLS
Sum of squared residuals:
These are found by solving min. SSR=∑(yᵢ-β₀-β₁xᵢ-β₂x₂)²
This yields:
Partialling-out method:
1) Estimate an SLR of x₁ on x₂
xᵢ1= α0+α₁xᵢ2+ri1
And us this to compute the residual
ri1^=xᵢ1-α0^-α₁^xᵢ2
This partials out the variation of xᵢ1 explained by xᵢ2
Properties of ri1^:
• ∑ri1^=0
• ∑ ri1^xᵢ2=0
• ∑ ri1^xᵢ1=∑ri1^²
2) Regress yᵢ on ri1^ and a constant
yᵢ= θ+ β₁ri1^+vi
Giving β̂₁= ∑ri1^yᵢ/ ∑ri1^²
ŷ
fi ff û ff