Econometrics
Chapter 1
Nonexperimental data (observational data) are not accumulated through controlled experiments on
individuals, firms or segments of the economy.
Experimental data are often collected in laboratory environments in the natural sciences, but they
are more difficult to obtain in the social sciences.
Empirical analysis: uses data to test a theory or to estimate a relationship.
Economic model: consists of mathematical equations that describe various relationships. Example:
utility maximization framework, consumers do what gives them the most utility.
Once we have an economic model we need to turn it into an econometric model.
u contains unobserved factors = error term/disturbance term
β= parameter, describe the directions and strengths of the relationship
Cross-sectional data set consists of a sample of individuals, households, firms, etc. taken at a given
point in time. If a set of families was surveyed during different weeks of the same year, we would still
view this as a cross-sectional data set.
Important feature of cross-sectional data is that we often can assume that they have been obtained
by random sampling.
Time series data set consists of observations on a variable or several variables over time. The
chronological ordering of observations in a time series conveys potentially important information.
Most of the time these data are strongly related to their history. Data frequency plays an important
role in time series data sets (weekly, daily, yearly).
Pooled cross section: increase sample size by combining, for example, multiple years.
The point of a cross-sectional analysis is often to see how a key relationship has changed over time.
Panel data (longitudinal data) set consists of a time series for each cross-sectional member in the
data set.
Because panel data require replication of the same units over time, panel data sets, especially those
on individuals, households, and firms, are more difficult to obtain tan pooled cross sections.
Having multiple observations on the same units allows us to control for certain unobserved
characteristics of individuals, etc.
The goal is to infer that one variable has a causal effect on another variable.
Ceteris paribus: ‘other factors being equal’. Not all possible factors can be hold as equal.
Counterfactual reasoning has counterfactual outcomes (potential outcomes) by considering these
we ‘hold all factors fixed’ because the counterfactual thought experiment applies to each individual
separately.
Chapter 2
2.1
Simple equation: y= β0 + β1x + u = simple linear regression model
y= dependent variable x= independent variable
Variable u= error term (disturbance) represents factors other than x that affect y.
∆y= β1x if ∆u=0
β1 is the slope parameter in the relationship between y and x, holding other factors in u fixed.
Intercept parameter β0 (constant term)
As long as the intercept β0 is included in the equation, nothing is lost by assuming that the average
1
,value of u in the population is zero → E(u)=0
If u and x are uncorrelated, then, as random variables, they are not linearly related.
For any x, we can obtain the expected value of u for that slice of the population described by the
value of x. The average value of u does not depend on the value of x → E(u|x) = E(u) →
when this holds, we say that u is mean independent of x.
Combine E(u)=0 and the mean independent, we obtain the zero conditional mean assumption: E(u|
x)=0
E(y|x)= β0 + β1x shows that the population regression function (PRF), E(y|x), is a linear function
of x. Gives us a relationship between the average level of y at different levels of x.
β0 + β1x = systematic part u = unsystematic part
2.2
i stands for certain observation
y̅ = β0ˆ + β1ˆx̅
β1ˆ= ∑(xi -x̅)(yi-y̅)/ ∑(xi -x̅)2 = ρxyˆ ·(σyˆ/ σxˆ)
ρxyˆ= sample correlation between xi and yi
β1ˆ= ∑(xi -x̅)(yi-y̅)/ ∑(xi -x̅)2 and β0ˆ= y̅ - β1ˆx̅ are called the ordinary least squares (OLS) estimates
of β0 and β1, these define a fitted value for y when x=xi.
The residual for observation i is the difference between the actual y i and its fitted value: ûi=yi - yiˆ= yi
- β0ˆ + β1ˆx̅
Ordinary least squares: these estimates minimize the sum of squared residuals.
OLS regression line: yˆ = β0ˆ + β1ˆx = sample regression function (SRF) because it is the
estimated version of the population regression function. It is obtained for a given sample of data, a
new sample will generate a different slope and intercept.
Most cases, the slope estimate: β1ˆ=∆yˆ/∆x
2.3
Given β0ˆ and β1ˆ, we can obtain the fitted value yiˆ for each observation. Each fitted value of y iˆ is on
the OLS regression line.
The OLS residual associated with observation I, û i, is the difference between yi and its fitted value.
ûi >0 underpredicts ûi < 0 overpredicts
Properties:
1. Sum, the sample average of the OLS residuals, is zero: ∑û i =0
2. The sample covariance between the regressors and the OLS residuals is zero: ∑x iûi =0
3. The point (x̅, y̅) is always on the OLS regression line.
yi= yiˆ + ûi
Total sum of squares (SST) ≡ ∑(yi-y̅)2 → measure of total sample variation in the y i; measures how
spread out the yi are in the sample.
Explained sum of squares (SSE) ≡ ∑(yiˆ-y̅)2
Residual sum of squares (SSR) ≡ ∑ûi2
SST = SSE + SSR
R-squared (coefficient of determination) R2≡ SSE/SST = 1- SSR/SST (goodness-of-fit measure)
2
R ratio of the explained variation compared to the total variation. Fraction of the sample variation in
y that is explained by x.
R2=1 is a perfect fit of the OLS line
Sometimes, the explanatory variable explains a substantial part of the sample variation in the
dependent variable.
2
, 2.4
If the dependent variable is multiplied by the constant c (each value in the sample multiplied by c)
then the OLS intercept and the slope estimates are also multiplied by c.
Generally, if the independent variable is divided or multiplied by some nonzero constant, then the
OLS slope coefficient is multiplies or divided by c, respectively.
Nonlinearity → log (y)
Constant elasticity model: log(y) log(x)
Change to logarithmic form approximates a proportionate change, so nothing happens to the slope.
Summary of functional forms involving logarithms:
Model Dependent variable Independent variable Interpretation of β1
Level-level Y X ∆y=β1∆x
Level-log Y Log(x) ∆y=(β1/100)%∆x
Log-level Log(y) X %∆y=(100β1)∆x
Log-log Log(y) Log(x) %∆y= β1%∆x
Log-level= semi-elasticity log-log= elasticity
Regression model is linear when it is linear in the parameters
2.5
Assumptions simple linear regression:
SLR.1: y= β0 + β1x + u (linear in parameters)
SLR.2: Have random sample of size n, following the population model in SLR.1 → in terms of random
sample: yi = β0 + β1xi + ui with i= 1, 2, …., n
SLR.3: Sample outcomes on x, are not all the same value. If the sample variation of x i=0, then SLR.3
fails, otherwise it holds.
SLR.4: The error u has an expected value of zero given any value of the explanatory variable: E(u|x)=0
(Zero conditional mean)
Conditioning on the sample values of the independent variable is the same as treating the x i as fixed
in repeated samples. Ui and xi are independent.
Once we assume E(u|x)=0, and we have random sampling, nothing is lost in derivations by treating x i
as nonrandom.
OLS estimators are unbiased: β1ˆ = ∑(xi - x̅)yi / ∑( xi - x̅)2 (SLR.4)
β1ˆ=β1+ (1/SSTx)∑diui di= xi - x̅
The randomness in β1ˆ is due entirely to the errors in the sample. The fact that these errors are
generally different from zero is what causes β 1ˆ to be different from β1.
Spurious correlation: find a relationship between y and x that is really due to other unobserved
factors that affect y and also happen to be correlated with x.
SLR.5: The error u has the same variance given any value of the explanatory variable: Var(u|x)=σ 2
Homoskedasticity
σ2=E(u2|x) → σ2 is also the unconditional expectation of u 2. σ2 =E(u2) = Var(u), because E(u)=0. In
other words σ2 is the unconditional variance of u, and σ2 is often called the error variance.
SLR.4: E(y|x)= β0 + β1x
SLR.5: Var(y|x) = σ2
When Var(u|x) depends on x, the error term is said to exhibit heteroskedasticity. Because Var(u|
x)=Var(y|x), heteroskedasticity is present whenever Var(y|x) is a function of x.
The larger the error variance, the larger is Var(β 1ˆ). More variation in the unobservables affecting y
3