Self-study questions Chapter 3
1. (a) Why does OLS estimation involve taking vertical deviations of the points to the line rather than
horizontal distances?
(b) Why are the vertical distances squared before being added together?
(c) Why are the squares of the vertical distances taken rather than the absolute values?
1. (a) The use of vertical rather than horizontal distances relates to the idea that the explanatory
variable, x, is fixed in repeated samples, so what the model tries to do is to fit the most
appropriate value of y using the model for a given value of x. Taking horizontal distances would
have suggested that we had fixed the value of y and tried to find the appropriate values of x.
(b) When we calculate the deviations of the points, yt, from the fitted values, ŷt , some points
will lie above the line (yt > ŷt ) and some will lie below the line (yt < ŷt ). When we calculate
the residuals ( û t = yt – ŷt ), those corresponding to points above the line will be positive and
those below the line negative, so adding them would mean that they would largely cancel out.
In fact, we could fit an infinite number of lines with a zero average residual. By squaring the
residuals before summing them, we ensure that they all contribute to the measure of loss and
that they do not cancel. It is then possible to define unique (ordinary least squares) estimates
of the intercept and slope.
(c) Taking the absolute values of the residuals and minimising their sum would certainly also
get around the problem of positive and negative residuals cancelling. However, the absolute
value function is much harder to work with than a square. Squared terms are easy to
differentiate, so it is simple to find analytical formulae for the mean and the variance.
2. Explain, with the use of equations, the difference between the sample regression function and the
population regression function.
2. The population regression function (PRF) is a description of the model that is thought to be
generating the actual data and it represents the true relationship between the variables. The
population regression function is also known as the data generating process (DGP). The PRF
embodies the true values of and , and for the bivariate model, could be expressed as
y t xt u t
Note that there is a disturbance term in this equation. In some textbooks, a distinction is drawn
between the PRF (the underlying true relationship between y and x) and the DGP (the process
describing the way that the actual observations on y come about).
, The sample regression function, SRF, is the relationship that has been estimated using the
sample observations, and is often written as
yˆ t ˆ ˆxt
Notice that there is no error or residual term in the equation for the SRF: all this equation
states is that given a particular value of x, multiplying it by and adding will give the
model fitted or expected value for y, denoted ŷ . It is also possible to write
y t ˆ ˆxt uˆ t
This equation splits the observed value of y into two components: the fitted value from the
model, and a residual term. The SRF is used to infer likely values of the PRF. That is the
estimates and are constructed, for the sample data.
3. What is an estimator? Is the OLS estimator superior to all other estimators? Why or why not?
3. An estimator is simply a formula that is used to calculate the estimates, i.e. the parameters
that describe the relationship between two or more explanatory variables. There are an infinite
number of possible estimators; OLS is one choice that many people would consider a good one.
We can say that the OLS estimator is “best” – i.e. that it has the lowest variance among the
class of linear unbiased estimators. So it is optimal in the sense that no other linear, unbiased
estimator would have a smaller sampling variance. We could define an estimator with a lower
sampling variance than the OLS estimator, but it would either be non-linear or biased or both!
So there is a trade-off between bias and variance in the choice of the estimator.
4. What five assumptions are usually made about the unobservable error terms in the classical linear
regression model (CLRM)? Briefly explain the meaning of each. Why are these assumptions made?
4. A list of the assumptions of the classical linear regression model’s disturbance terms is given
in Box 2.3 on p.44 of the book.
We need to make the first four assumptions in order to prove that the ordinary least squares
estimators of and are “best”, that is to prove that they have minimum variance among the
class of linear unbiased estimators. The theorem that proves that OLS estimators are BLUE
(provided the assumptions are fulfilled) is known as the Gauss-Markov theorem. If these
assumptions are violated (which is dealt with in Chapter 4), then it may be that OLS estimators
are no longer unbiased or “efficient”. That is, they may be inaccurate or subject to fluctuations
between samples.