1 - Nature of econometrics
Econometrics uses non-experimental data, so observational. It’s not generated in a laboratory, but
often through surveys.
Cross-sectional data is taken at a given point in time, and can not be used for comparing changes
in the same observed subject over time.
Time-series data is a series of observations on variables over time, such as CPI and GPD growth,
and are ordered to convey important information.
Panel data (longitudinal data) consists of time series for each cross-sectional member, so time
series following the same subject (e.g. individuals) in the data set.
Data is obtained through random sampling (or at least we assume this) from the underlying
population.
If y1 … yn are independent random variables with a common probability density function f(y; θ)
they form a random sample and are independent, identically distributed i.d.d. random variables.
Obtaining samples tells us something about θ.
Correlation is a dependence/association between 2 random variables: Corr(x, y).
Causality is cause and effect x -> y. Correlation does not imply causation.
Ceteris paribus: Holding all other factors fixed; important assumption for establishing causal
interpretation.
Endogeneity: An explanatory variable x is correlated with the error term u. An explanatory variable
is not independent of other factors causing y.
Multicollinearity: Two explanatory variables are correlated, so Corr(x1, x2).
Simple linear regression
The simple linear function y = ß0 + ß1x + u regresses y on x. The parameters ß are to be
estimated from the sample data, u is random error or disturbance summarising all unobserved
factors.
We want to know how y changes when x changes holding u fixed: Ceteris paribus effect. Holding
u fixed means ∆u = 0, leaving us with ∆y = ß1∆x so ∆y/∆x = ß1 ceteris paribus if ∆u = 0.
The error or disturbance term u contains everything not controlled for: Omitted variables,
measurement errors, non-linearities, and unpredictable effects. We assume E(u) = 0 depending on
x, so for any x we expect the zero conditional mean: E(u|x) = E(u) = 0. The average value of u
doesn’t depend on x and is expected to be zero.
Holding of the zero conditional mean means Cov(x, u) = 0. Only if we assume the zero conditional
mean, so no endogeneity, can we have a causal interpretation of x on y. Often violated.
The parameters are estimated by taking a sample of observations (xi,yi) i = 1…n where we
observe the x and y values for each observation n. We assume it’s a random sample: xi are
independent, identically distributed i.d.d. so each value of sample has the same probability of
being chosen from the population.
Estimated parameters are denoted with a hat (or in italic in this summary): ß0 is the estimate of the
unknown population parameter ß0.
We estimate them using the Least Squares Principle: We minimise the sum of squared residuals.
The predicted (fitted) value = y = ß0 + ß1xi often is not the same
as the actual observed value. By filling in the values for x, we find
a predicted value for y. The difference between this prediction and
actual is the residual u = yi - yi. Thus we minimise that residual u:
To find it we take derivatives w.r.t. to each ß we’re trying to
estimate. We then get what’s shown on the right, which we can
solve for the ß’s like any 2 linear equations with 2 unknowns.
,This yields the Ordinary Least Squares OLS
estimators:
These estimates form the OLS regression
line, yi = ß0 + ß1xi and can be used to make
predictions for y for any value of xi. By
regressing y on x we obtain the OLS estimators.
Goodness of fit
How well does the x explain the y is measured by developing a measure of the variation in yi that
we predict by our OLS estimators; the variation actually explained.
Total Sum of Squares SST = ∑(yi - yavg)^2 so a measure about the total variation in y about
sample mean.
Explained Sum of Squares SSE = ∑(yi - yavg)^2 so the predicted value for y minus the
average, the amount of variation that we actually explain by our OLS regression.
Residual Sum of Squares SSR = ∑(yi - yi)^2 = ∑(ui)^2 so the difference between the actual y
and the fitted (predicted) y: The unexplained variation in y.
Of course, total variation is explained + unexplained: SST = SSR + SSE.
We can use this to make a measure of fit, the coefficient of determination R2:
R-squared = SSE/SST = 1 - SSR/SST. So the ratio of explained sample variation SSE to total
sample variation SST. Closer to 1 means a higher fraction is predicted. R2 = .28 means 28% of
the variation in y is explained by x.
Units of measurement & functional form
Scaling
Taking yi = ß0 + ß1xi + ui with the variance σ^2 = var(ui).
Take yi* = ayi which is a multiple of y, and xi*= dxi a multiple of x.
We can rewrite the equation as yi* = ß0* + ß1*xi *+ ui* where ß0* = a times ß0.
ß1* = a/d times ß1. σ^2u* = a^2 times σ^2u.
So we simply scaled the equation: This means the coefficient ß must be divided by d to account
for the variable being multiplied by d, while it’s multiplied by a since the dependent variable is
scaled by a. Variance is multiplied by the same factor as the dependent variable y.
So if 10.2ß measures the effect of $100s of dollars on a certain y, measuring it in 1$ means we
must divide 10.2 by 100: 0.102ß gives the same effect.
Non-linearities
Appropriate transformations of y and/or x can represent non-linear relationships, such as the
natural logarithm ln which changes the interpretation of coefficients.
If be have ln for both x and y, we get the constant elasticity model: The elasticity is the percentage
change in y due to a percentage change in x. ln(y) = ß0 + ß1ln(x) = dy/y / dx/x = dy/dx * x/y.
This is the log-log model.
To find the effect, differentiate y w.r.t. x: yi = ß0 + ß1xi + ui differentiated w.r.t. x gives dy = ß1dx,
and dividing the change in y by the change in x gives us the effect: dy/dx = ß1. So for the level-
level case, y changes by ß1 unit when x changes by 1 unit.
Log-level ln(y) = ß0 + ß1xi + ui we transform it to y = exp(ß0 + ß1xi + ui ) since exp(x) means
e^x which gives us y instead of ln(y). Differentiate y w.r.t x gives us dy/dx = ß1exp(y) = ß1y.
dy/y is then ß1dx. So the change in y divided by y equals the change in x times ß1.
Percentage change in y = ß1*unit change in x.
Level-log y = ß0 + ß1lnxi + ui which differentiated for y w.r.t. x gives dy/dx = ß1/x so dy = ß1 dx/x.
Increasing x by 1% increases y by ß1/100 units of y. We divide ß1 by 100 since it increases by the
percentage.
, Log-log = ln(y) = ß0 + ß1lnxi + ui
which solved for y is y = exp(ß0 +
ß1lnxi + ui) which differentiated w.r.t
x dy/dx = ß1/x exp(ß0 + ß1lnxi +
ui) = ß1/x * y. This means dy/y =
ß1(dx/x).
Log-log allows us to express the
percentages change.
How good are the OLS
estimators?
We have estimators ß for the true, non-observed population parameters ß.
Taking a different sample and estimating ß again gives a different estimate. They together form the
sampling variation where the estimators are random variables themselves with a mean and
variance. The mean and variance of the estimators ß are the sampling properties.
We want estimators that are:
-Unbiased: E(ß) = ß.
-Efficient, smallest variation possible Var(ß).
-Consistent, approach population ß as the sample size gets bigger.
To find unbiased, efficient,
consistent estimators we
must establish the classical
assumptions SLR1-SLR5.
Under the assumptions 1 to
4 E(ß) = ß for any value:
They’re unbiased, their mean
value equals the actual
value. Under repeated
sampling they’ll approach ß.
The variance of the OLS estimators tells us how far we
can expect ß to be from ß on average: Find the most
efficient among all unbiased estimators. To compute the
OLS estimator variance we need assumption 5,
homoskedasticity: constant variance. The explanatory
variable value contains no information about the
variability of the unobserved factors.
Var(y|x) = σ^2, so the variability is constant across all
levels of x.
The sampling variance of the OLS estimator is calculated
as shown on the left. The var(ß1) tells us about the
sampling precision of the estimator: Repeated sampling
and estimating yields a
distribution of ß1 from
which we can compete the variance.
High σ^2 implies a large uncertainty of the model, which also
increases the variance of the estimators. Higher variance.
The variability in x implies more statistical information making the
estimates more precise: Lower variance.
We assume Var(ui) = σ^2 but we must estimate σ^2. An unbiased estimator of
the Variance of the error term E(u^2) = σ^2 is given by the formula shown here:
Knowing σ^2 allows us to compute the variance of the estimators. Higher σ^2 in
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller bramdelange. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $10.27. You're not tied to anything after your purchase.