Econometrics
Chapter 1
Nonexperimental data (observational data) are not accumulated through controlled experiments on
individuals, firms or segments of the economy.
Experimental data are often collected in laboratory environments in the natural sciences, but they
are more difficult to obtain in the social sciences.
Empirical analysis: uses data to test a theory or to estimate a relationship.
Economic model: consists of mathematical equations that describe various relationships. Example:
utility maximization framework, consumers do what gives them the most utility.
Once we have an economic model we need to turn it into an econometric model.
u contains unobserved factors = error term/disturbance term
β= parameter, describe the directions and strengths of the relationship
Cross-sectional data set consists of a sample of individuals, households, firms, etc. taken at a given
point in time. If a set of families was surveyed during different weeks of the same year, we would still
view this as a cross-sectional data set.
Important feature of cross-sectional data is that we often can assume that they have been obtained
by random sampling.
Time series data set consists of observations on a variable or several variables over time. The
chronological ordering of observations in a time series conveys potentially important information.
Most of the time these data are strongly related to their history. Data frequency plays an important
role in time series data sets (weekly, daily, yearly).
Pooled cross section: increase sample size by combining, for example, multiple years.
The point of a cross-sectional analysis is often to see how a key relationship has changed over time.
Panel data (longitudinal data) set consists of a time series for each cross-sectional member in the
data set.
Because panel data require replication of the same units over time, panel data sets, especially those
on individuals, households, and firms, are more difficult to obtain tan pooled cross sections.
Having multiple observations on the same units allows us to control for certain unobserved
characteristics of individuals, etc.
The goal is to infer that one variable has a causal effect on another variable.
Ceteris paribus: ‘other factors being equal’. Not all possible factors can be hold as equal.
Counterfactual reasoning has counterfactual outcomes (potential outcomes) by considering these
we ‘hold all factors fixed’ because the counterfactual thought experiment applies to each individual
separately.
Chapter 2
2.1
Simple equation: y= β0 + β1x + u = simple linear regression model
y= dependent variable x= independent variable
Variable u= error term (disturbance) represents factors other than x that affect y.
∆y= β1x if ∆u=0
β1 is the slope parameter in the relationship between y and x, holding other factors in u fixed.
Intercept parameter β0 (constant term)
As long as the intercept β0 is included in the equation, nothing is lost by assuming that the average
value of u in the population is zero → E(u)=0
, If u and x are uncorrelated, then, as random variables, they are not linearly related.
For any x, we can obtain the expected value of u for that slice of the population described by the
value of x. The average value of u does not depend on the value of x → E(u|x) = E(u) →
when this holds, we say that u is mean independent of x.
Combine E(u)=0 and the mean independent, we obtain the zero conditional mean assumption: E(u|
x)=0
E(y|x)= β0 + β1x shows that the population regression function (PRF), E(y|x), is a linear function
of x. Gives us a relationship between the average level of y at different levels of x.
β0 + β1x = systematic part u = unsystematic part
2.2
i stands for certain observation
y̅ = β0ˆ + β1ˆx̅
β1ˆ= ∑(xi -x̅)(yi-y̅)/ ∑(xi -x̅)2 = ρxyˆ ·(σyˆ/ σxˆ)
ρxyˆ= sample correlation between xi and yi
β1ˆ= ∑(xi -x̅)(yi-y̅)/ ∑(xi -x̅)2 and β0ˆ= y̅ - β1ˆx̅ are called the ordinary least squares (OLS) estimates
of β0 and β1, these define a fitted value for y when x=xi.
The residual for observation i is the difference between the actual y i and its fitted value: ûi=yi - yiˆ= yi
- β0ˆ + β1ˆx̅
Ordinary least squares: these estimates minimize the sum of squared residuals.
OLS regression line: yˆ = β0ˆ + β1ˆx = sample regression function (SRF) because it is the
estimated version of the population regression function. It is obtained for a given sample of data, a
new sample will generate a different slope and intercept.
Most cases, the slope estimate: β1ˆ=∆yˆ/∆x
2.3
Given β0ˆ and β1ˆ, we can obtain the fitted value yiˆ for each observation. Each fitted value of yiˆ is on
the OLS regression line.
The OLS residual associated with observation I, û i, is the difference between yi and its fitted value.
ûi >0 underpredicts ûi < 0 overpredicts
Properties:
1. Sum, the sample average of the OLS residuals, is zero: ∑û i =0
2. The sample covariance between the regressors and the OLS residuals is zero: ∑x iûi =0
3. The point (x̅, y̅) is always on the OLS regression line.
yi= yiˆ + ûi
Total sum of squares (SST) ≡ ∑(yi-y̅)2 → measure of total sample variation in the y i; measures how
spread out the yi are in the sample.
Explained sum of squares (SSE) ≡ ∑(yiˆ-y̅)2
Residual sum of squares (SSR) ≡ ∑ûi2
SST = SSE + SSR
R-squared (coefficient of determination) R2≡ SSE/SST = 1- SSR/SST (goodness-of-fit measure)
2
R ratio of the explained variation compared to the total variation. Fraction of the sample variation in
y that is explained by x.
R2=1 is a perfect fit of the OLS line
Sometimes, the explanatory variable explains a substantial part of the sample variation in the
dependent variable.