Econometrics chapter 1 - The nature of econometrics and economic data
Mathematical statistics focuses on experimental economic data. Econometrics on non-
experimental data so not accumulated through controlled experiments on individuals, firms or
segments. This is observational or retrospective data, passively collected by the researcher.
Experimental data are often collected in laboratory environments and difficult to obtain in social
sciences.
Empirical analysis uses data tot test a theory or estimate a relationship. A formal economic model
can be constructed consisting of mathematical equations describing various relationships. For
example an economic model of crime where the dependent variable y depends on the
independent variables xi. As is common in economic theory, we do not
know specifically the function f which depends on an underlying utility
function that’s rarely known. yet we can predict the effect each variable
would have on criminal activity.
The determinants of criminal behaviour are reasonable based on common
sense. We can use the same intuition instead of formal economic theory to
realise that factors as education, experience and training affect
productivity which in turn determines wage. Therefore we can make a
simple model where wage = f (educ, exper, training). This is the economic
model. We now want to turn it into an econometric model by specifying the function f and deal
with variables that can’t be reasonably observed for a given individual. Instead we rely on
averages and statistics to
derive an approximate
variable.
The ambiguities inherent in the
economic model of crime are
resolved by specifying a
particular econometric model,
shown here on the right.
The term u contains
unobserved factors: u is random error or disturbance term summarising all unobserved factors.
We can never estimate u entirely.
The constants ß0 to ß6 are parameters of the econometric model and describe the directions and
strengths of the relationship between crime and the factors used to determine crime. We’d expect
ß1wage to be negative as a higher wage would result in less crime intuitively. ß are to be
estimated from sample data.
After making (1) making an economic model and (2) making an econometric model with
unspecified parameters we (3) state an hypotheses based for the unknown parameters. An
empirical analysis requires data. We (4) collect data, (5) apply econometric methods to estimate
parameters and (6) use estimated parameters to make predictions on e.g. economic theory.
Structure of economic data
A cross-sectional data set consists of a sample taken at a given point in time. We assume they’ve
been obtained by random sampling from the underlying population, yet this random sampling
may be violated. Cross-sectional data is e.g. of 500 workers in 1976, with variables wage,
experience, education etc.
Time series data sets consists of observations on one or more variables over time, e.g. GDP and
CPI for the years 1950 to 2018. Most economic and other time series are related over time, as
GDP in one year is strongly related to GDP in another. Data frequency matters as many economic
,time series display strong seasonal patterns, as monthly data showing housing prices depends on
weather among other things.
Pooled cross sections have both cross-sectional and time series features. Suppose a yearly
cross-sectional survey combined over several years.
Panel data or longitudinal data sets consist of a time series for each cross-sectional member in
the data set. We observe a set of variables for a set of individuals over a certain time period. The
difference between panel data and pooled cross sections is that the panel data follows the same
cross-sectional units over a given time period.
Cross sectional data gives us independent random variables with a common probability density
function where each observation is an independent identically distributed (idd) random variable.
Causality and ceteris paribus
Just because ß > 0 doesn’t mean that x causes y, or that there’s causality.
Correlation is a dependence or association between two random variables, but correlation
doesn’t equal causation.
Ceteris paribus means holding other (relevant) factors equal an matters for causal analysis: If we
don’t hold other factors fixed such as income we can’t know the causal effect of, say, price
change on quantity demanded. If we succeed in holding all other relevant factors fixed and then
find a link we can conclude a causal effect.
Though ceteris paribus is an important assumption for causality it’s usually impossible to hold
everything else equal, so we put aim for holding enough: Econometrics methods can simulate a
ceteris paribus experiment.
When examining the effects of education on wage we have endogeneity: People choose their
education level so it’s not independent of other factors such as intelligence. When born with
higher intelligence, you’re more likely to choose to go to university, but would you not have
already earned more wages even without education if you’re more intelligent?
Chapter 2 - The simple linear regression model
Often we’re interested in explaining y in terms of x, or studying how y varies with changes in x.
We must allow for other factors than just x to affect y, must determine the functional relationship
between x and y, and ensure that we are capturing a citrus paribus relationship. A simple
relationship would be: y = ß0 + ß1x + u.
This is the simple linear regression model or the two-variable linear regression model. y is the
dependent/explained/left hand side variable, whereas x is the independent/explanatory/RHS
variable.
ß0 is the intercept parameter, a constant term. ß1 is the slope parameter, of main interest. u is the
error term or disturbance in the relationship, representing factors other than x affecting y; u
stands for unobserved.
If other factors in u are held fixed the change in u is zero: If ∆u = 0, x has a linear effect on y.
∆y = ß1∆x if ∆u = 0. So y changes by the parameter ß1 times x if the change in unobserved
variables is zero, meaning then x has a linear effect on y.
ß1 is the slope parameter in the relationship between y and x, holding other factors in u fixed. The
intercept parameter ß0 is a constant term but of less interest than the sole parameter.
If ∆u = 0 then ∆y = ß1∆x so ß1 = ∆y/∆x, meaning the slope parameter measures the change of y for
a change of 1 in x only if the change in unobserved variable is zero.
The linearity of our equation implies constant returns, while for education for example we might
expect increasing returns (a master brings more marginal wage than a one extra year of bachelor).
For now though we focus on whether the linear model really allows us to draw ceteris paribus
conclusions: How can we learn in genera about the ceteris paribus effects of x on y holding all
other factors fixed?
,U includes omitted variables (the unobserved ones), measurement errors in y and x, non-non-
linearities in the relation between x and y, and unpredictable or random effects.
As long as the intercept ß0 is included in the equation we don’t lose anything by assuming the
average value of u in the population is zero. E(u) = 0 is an assumption we can make, and it says
nothing about the relationship between u and x, we can assume the average u is zero safely.
The correlation coefficient is a natural measure of the association between two random variables,
which is useful for u and x. If u and x are uncorrelated they’re not linearly related, but correlation
only measures linear dependence so we can’t rule out all relations between x and u.
Our assumption is that the average value of u does not depend on the value of x: E (u|x) = E (u).
So the average value of the unobservables is the same across all values of x, given a value of x
we expect the value of u to be zero. If E (u|x) = E (u) holds we know the covariance (x, u) = 0.
When it holds, u is the mean independent of x. Assuming u is zero, and assuming the average
value of x does not depend on u, we obtain the zero conditional mean assumption: E (u|x) = 0.
Only if the zero conditional mean assumption holds do we have causal interoperation of x on y.
For a function of wage as a function of education, assume u is the innate ability a person
possesses regardless of education. Equation E (u|x) = E (u) requires that the average level of
ability is the same regardless of years of education: E(ability | 8) must equal E(ability | 16), so the
average ability of people with 8 years of education must equal those with 16 years. If we think
average ability increases with years of education, we can’t assume ceteris paribus and the
equation E (u|x) = E (u) = 0 doesn’t hold.
The zero conditional mean assumption gives ß1 a useful interpretation: Assuming E(u|x) = 0 gives
us E(y|x) = ß0 + ß1x since we can leave out u if we assume the mean is zero. This shows
population regression function PRF E(y|x) is a linear function x. This linearity means a one-unit
increase in x changes the expected value of y by ß1.
Given the zero conditional mean assumption E (u|x) = 0 we can view the equation y = ß0 + ß1x + u
in two parts. ß0 + ß1x is the systematic part of y: The part of y explained by x.
The unsystematic part is u, which is the part of y not explained by x.
Deriving the ordinary least squares estimates
Now we must estimate the slope parameter and intercept parameter ß, for which we use a sample
from the population. A random sample size n from the population gives yi = ß0 + ß1xi + ui where ui is
the error term for observation i since it contains all factors affecting yi other than xi.
We use this data to estimate the slope and intercept.
Since u is uncorrelated with x we see that u has zero expected value and the covariance between
x and u is zero: E(u) = 0 and cov(x, u) = E(xu) = 0.
Since u = y - ß0 - ß1x we can assume E(y - ß0 - ß1x) = 0. We can use this information to obtain good
estimators of ß0 and ß1 given a sample of data: We choose ß0 and ß1 as the estimates of the
unknown population parameter ß0. We predict the fitted value yi = ß0 + ß1xi.
The estimates for ß are called the ordinary least squares OLS estimates. The difference between
the predicted y and its fitted value is the
residual for each observation i. The sum of
squared residuals is what we want to keep as
small as possible, to keep both positive and
negative errors to a minimum. We thus want
to minimise the formula shown here.
, To do this we take the first order conditions for the OLS
estimates to minimise. We take derivatives with respect to ß0.
The derivative w.r.t. ß0 is shown above, w.r.t ß1 is shown here.
This gives us the two first order conditions, which can be
solved for ß’s.
Using a dataset of 209 ceo’s, their salary and ROE of their
companies we can
use the Ordinary
Least Squares
method to
find the regression line on the left, which relates salary
to ROE. Salary hat indicates that it’s an estimated
equation.
Properties of OLS on any sample of data
Given our estimates of ß0 and ß1 we can obtain the fitted value yi for each observation, which by
definition will be on the OLS regression line. The OLS residual associated with observation i is the
estimated ui which is the difference between yi and its fitted value. If the residual u is negative the
line overpredicts y, if the residual is positive the line underpredicts y. In most cases every residual
is not equal to zero, so none of the actual data points actually lie on the OLS line.
There are three important algebraic properties of the Ordinary Least Squares OLS estimates.
(1) The sum and thus the sample average of the OLS residuals u is zero as shown here:
This because the OLS estimates for ß are chosen to make the residuals add up to zero for
any data set, saying nothing about a particular observation.
(2) The sample covariance between the regressors and OLS residuals is zero.
(3) the point (x, y) for any given x or y is always on the OLS regression line: If we take equation y =
ß0 + ß1x and plug in any x we always get y.
We can also interpret an OLS regression by writing yi as its fitted value plus residual: yi = yi + ui.
So for each estimated y plus
the estimated u we end up
totalling the observed y.
The total sum of squares SST
is a measure of the total
sample variation in yi, so a
measure of how spread out
the observed yi are in the
sample. If we divide SST by n
- 1 we obtain the sample
variance of y.
The explained sum of squares
SSE measures the sample
variation in y, so the variance
in the estimated values for yi.
The residual sum of squares
SSR is variation in y not
explained by the variation in y,
and is thus the variation in u.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller bramdelange. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $11.35. You're not tied to anything after your purchase.