Module 1 – Lecture 1
In ordinary least squares regression, it is assumed that the relationship between
variables is linear. A regression formula consists of a deterministic component
and the random error, also called the residual. Least squares refers to how the
model is estimated: the regression line is chosen in such a way that sum of the
squared vertical distance of each point to the line is as small as possible. The
n
least squares principle minimizes ∑ ε^ 2i = ^ε 21+ ε^ 22 +ε^ 2k.
i=1
Gauss-Markov theorem: if the frst fve OLS assumptions are satisfed, then the
least squares estimator is the Best Linear Unbiased Estimator (BLUE) of each lin-
ear combination of the observations. "Best" means that it is the parameter esti-
mate with the smallest variance; "unbiased" means that the expected value of
the parameter estimated by the model is equal to its population value. An unbi-
ased estimator with minimal variance is called efficient, and an estimator that
approaches the true population value as sample size increases is called consis-
tent. The OLS assumptions are:
1. all variables are measured at the interval level and without error,
An error in Y is not problematic, because it's corrected for by the er-
ror term. An error in X is difficult to correct, and leads to an under-
estimation of the true population value of β in a bivariate model,
and often in multivariate models as well.
2. for each value of the independent variables, the mean value of the error
term = 0,
For example: left hand fgure is good; right hand fgure is non-linear
or needs another predictor.
3. homoskedasticity: all random variables in the sequence have the same
variance,
For example: variation in income is higher among older people
heteroskedasticity.
, Heteroskedastic estimates are still linear unbiased estimates, but
not "best". Standard errors of parameters are biased, and statistical
tests are thus not reliable.
4. no autocorrelation: cov ( e i , e j )=0 the error terms should be uncorrelated,
5. each independent variable is uncorrelated with the error term,
6. no multicollinearity: no independent variable should be perfectly (or ap-
proximately) linearly related to one or more of the other independent vari-
ables,
7. the conditional errors are normally distributed.
Note that the last two are not part of the BLUE criteria. The error terms should
graphically look like this, with a mean of zero, constant variance and normal dis-
tribution.
Heteroskedasticity, for example, looks like this. Note that the variance is not the
same for each value of X.
, These conditions are tested by residual analysis. A residual is the vertical dis-
tance of an observation to the regression line. Aims of residual analysis are:
1. global evaluation of the model: are important variables lacking? Is the re-
lationship between X and Y linear? Are predictors too strongly correlated?
2. examining individual cases, especially if N <500. Do specifc observations
ft badly? Do they inluence the estimation of the betas too much?
3. checking the trustworthiness of statistical test outcomes.
A scatter plot shows the association between two variables (in a single predictor
OLS model between X and Y). When there are several predictors, you could make
a scatter plot for each of them with Y, but then the efect of all other predictors
are not taken into account. A partial plot looks exactly like a scatter plot, but
plots the residuals from the predictor of interest against those of the remaining
variables.
Dealing with inluential cases
To fnd (overly) inluential observations, LEVER is a numerical instrument that
can be used. It measures the distance of the value of an individual observation
on a predictor to the center of the values of the other observations on that pre-
dictor, along with the inluence resulting from this distance. Values in the center
have a distance of ± 0, indicating no inluence. Higher values indicate a stronger
2∗p
inluence; critical value ¿ , where p = number of predictors and n = number
n
of observations. MAHAL is similar to LEVER.
Cook's distance D and DfFit both show the diference between betas estimated
with and without an individual observation. The larger this diference, the more
inluential the observation. Critical value of Cook's D ¿ 4 /n, critical value of DfFit
¿ 2∗√ p/n .
A good numerical instrument for the detection of inluential cases is SDRESID.
RESID is the estimated residual (e^ i ,absolute size of error); ZRESID is the stan-
dardized residual (e^ i /σ , relative size of error). Studentized residual (SRESID) is
like ZRESID, but with expected value 0 and variance 1. SDRESID is the studen-
tized deleted residual (does individual i ft well with all other individuals?).
Detecting and solving heteroskedasticity