Take aways Linear
regression topic
- :
2
1) interpretation of model & coefficients
2) properties of OLS mechanical statistical
properties
→ →
3) formulas for the OLS estimators
4) concept of sum of squared Residuals (SSR)
5) R
'
-
coefficient of determination
MODEL METHOD COEFFICIENTS
to learn about the
population relations estimators convert data
Model into estimates of the linear
to =
BE =
g- -
pie
from data → Linear
Regression
model Cov K' 4)
population relation ( expected) population
regression b, =
pi =
varix )
Values minimise the sum of squared residuals
term of each individual
possible
er ror
to best line
give us the
MECHANICAL PROPERTIES ALWAYS TRUE
indicator of many observations
n n
① I Ei
"
the residuals 0
of the OLS 0 =
sum is
f- I
↳ É, + É ,
+
. . .
+
Én =
0
Why First order condition for OLS
intercept
: :
;¥(y,÷✗i)= ¥2 É
-
2 0 the 0
so same as
=
,
② the OLS residual is uncorrelated with the
explanatory variable sample values
✗ n
variable
explicative (v1 ) x-axis
^ ÷
; n
.
: n
::: I Ei
•
✗i =
0
o ÷ i= 1
FirstOrder Condition
Why for OLS
slope
: :
>
É
>
-22¥ ,
✗i ( Yi -
Po p
-
, ✗i )=O so the same as
§= iÉ -0,
✗
°
linear trend of outcomes
regression gives us
"
the
"
residuals after fitting
°
a re random left -
overs regression
correlation left between and residuals
there
explanatory
•
is no var .
as the removed it
regression
regression model
:
dependent var .
=
(const .
t
independent war .
) +
error
unpredictability of our results
variation of observed value
the
portion of the
y er ror =
expected value -
that is
explained by the
mdep - var .
→ no
explanatory power
should be in there
→ all the
explanatory power should
if there is some
explanatory power it is because of
be there
the Omitted Variable Bias forB)
the deterministic
component is
supposed to explain the dependent variable
very slight inexplicability
that there the
so well is
only a in er ror term
,REMINDER :
Ordinary Least Squares
want to minimise the want to find the
"
best "
line
we
squares because we
" "
that maximise
precision it the data best possible
passes through
the
as in
way
minsse-min-E.ie: ¥
¥¥T
min
,
Deriving the formulas for OLS
interject Deriving the formulas for OLS
slope
J s
?_? xilyi
n
-2 Po -13*1--0
LIZ ( Yi Bxi) 0
-
= =
Bo
-
=
Spo gp
- -
, ,
,
so
÷EÉ=0 MPI so
÷⇐xiÉ =
0 MP2☐
③ the OLS
regression
line cuts
through the means
my
•
B
^ a
o slope
?
I YB
if
- • - .
B.
- - - . -
Bo
.
= +
A • • p,
o
• o
YA - •
independent explain
a
since the var .
a
the variation the dependent
intercept •
in var .
po
✗
I 1 >
the mean of dependent var .
is a
function of mdep . var .
✗a XB
④ the
sample dependent variable mean is
equal to mean of fitted 02s values
g-
=
Y
⑤ the OLS fitted values are uncorrelated with residuals
¥iyiÉ=o
REMINDER :
y
=
Bo
-1
B. ✗ + E
regression if
=
13^0 +
PEX
estimate the model
we need to use the expectation operator to find out if the estimators are unbiased ( or not)
Why ? Because OLS is BLUE (Best Linear Unbiased Estimator) when it is
:
UNBIASED EFFICIENT
expectation operator →
linear operator variance
operator →
quadratic operator
d. E(a✗)= a Ek) 1 .
va r (at) = al va r (X)
2. E-(a) = a
2 .
var (a) =
0
2abgyg.lk#
3. E(a✗ by) 3. ( ✗ BY)= a- var (X ) Ivar (4)
tax)+E(bY)=aE(×)+bE( y) var a + + +
+ -_
, STATISTICAL PROPERTIES
n n
① the OLS estimators Bo and
13 ,
are unbiased estimators of Bo and B ,
E-
(13^0)=130 ; E(pi ) =p , y= pot B. ✗
+ E estimated :
if =p:-. pix
② the variances of the OLS estimators are :
( Pi
2-2
( p:)
¥×i
)=÷¥
var var =
,
¥2 ( E)
'
✗i -
,
③ ¥ is an unbiased estimator
of var (E / ×)
ASSUMPTIONS
1) the
population relation linear
parameters →
is in existence of as
2) the
sample is random →
unbiased estimators
( X) ndep NOT
3) variation ✗ O
' van 's
there
→
> existence of as
-
:
is in var
constant
4) the unconditional mean of the error term is 0 :
E (E) =
0
AY some
will be
residuals
't 've
? →
unbiased estimators on
average we
expect
pegs
:
' ÷
the er ror term to be 0
.
: : :
residuals
i. :@ some
error term accounts for variation dependent
?÷
o
in var .
will be
negative
;
'
that is not
explained by the
independent .
: ii ↳
unpredictable random er ror
i. ✗
>
•
for the model to be unbiased it has to be 0
of prediction actual
on
average
fitted y error
y
var ( y predicted) +
- var ( er ror
) =
var (y )
↳
population mean of 0
5) the conditional mean is 0 :
=L (El × ) =
0 on
average
we
expect
the er ror term to have
→
unbiased estimators
MY the
÷
same value
regardless of ✗
: ÷"÷
.
this
imply :
.
i: : :: : : :: iii. i
one observation of the error term
1 E(4 / ×)
.
. =
Bo BY +
should not predict the next one
y
- . . . .
. . .
,
:
✗ 2. Cov (Ex) =
E (Ex) =O the error term is uncorrelated
¥ ,
lx ,
H ,
>
with the variable
CONCLUSION 1 : Under Ask -5
,
the OLS estimators are unbiased
⑥SPA
: E- (
pi) Bo = and
Elp? ) =p ,
, 6) E has the same variance for each value of the independent variable
var (E / ✗) = 52 homo she elasticity check
by doing a residual v. tilted value
plot
↳ same scatter
CONCLUSION 2
:
Under Ast -
6 ,
the variance of the OLS estimators are :
⑤ P2 : var
( Ps)
n
=
← var
( Po)=
n
¥×i -
É
" "
( ✗i I) I)
E. ( ✗ i
-
-
,
•
when var ( pi) is
high we are less sure about pi
close to
variance of error term being p ,
if y and strongly IT (pi ) T
•
✗ var
correlated , the
=
var
¥2 ( )'T
.
of er ror term will below
°
,
✗i
-
I var ( pi ) j
-
sample size
✗ l n
n 2
É
3 Under Ast 6
¥ Ei unbiased estimator
of 52
=
:
CONCLUSION I is an
-
,
,
and OLS is the one with lowest variance 0*3
CONCLUSION 4 :
Under Ast -
6 OLS estimators have a normal distribution
,
f) the error terms distribute normally and
independently from each other
E ~
N (0 ,
5h ) not
likely to be true ,
but if n_ is
large it doesn't matter
not required by optional assumption
→
OLS ,
→
allows to test and intervals
us
hypothesis , generate confidence prediction
2
g-
CONCLUSION 5 Under Ast 7 :
BJ ( B; )
:
N
-
~
;
normal distr :b
¥ lxii-x.IT/1-RjY
,
to check if it follows
°
a .
assess the normal
if residuals
probability plot
or p^j -
Pj ~
N ↳ 1)
↳ follow the
straight line
pig;)
I
sd (
distributed
they a re
normally Y
standard normal distribution 8h is KNOWN
pi; -
p; ~
tn -1<-1
selpi;) ^
"
f- distribution table 5 is UNKNOWN