Econometrics Summary Endterm
Chapter 1
Econometrics – economic measurement – is the quantitative measurement and analysis of
actual economic and business phenomena. It attempts to quantify economic reality and the real
world of human activity. It has three major uses:
1. Describing economic reality
2. Testing hypotheses about economic theory and policy
3. Forecasting future economic activity
Regression analysis is a statistical technique that attempts to explain movements in one
variable, the dependent variable, as a function of movements in a set of other variables –
independent (explanatory) variables. It CANNOT confirm causality. The simplest form:
Y = β0 + β 1 X
The s are coefficients that determined he coordinates of the straight line at any point. 0 is the
constant or intercept term (when X=0). 1 is the slope coefficient (response of Y to a 1 unit
ΔY
increase in X). in general, β 1= . An equation is linear when the plot is a straight line.
ΔX
Besides the variation in dependent that is caused by independent, there is almost always
variation that comes from other sources as well. A stochastic error term is a term that is added
to a regression equation to introduce all of the variation in Y that cannot be explained by the
included Xs. The error term is usually referred to as ϵ .
Y = β0 + β 1 X +ϵ
The first part of the equation is the deterministic part, the second is the stochastic component.
The deterministic is also an expected value. Error must be there because minor influences are
omitted, measurement error, underlying equation might have a different functional form and
all human behavior has purely random variation.
N is the amount of observations. Xi is the ith observation is the independent variable. X1i is the
ith observation of the first independent variable, and so on. The resulting equation is called a
multivariate linear regression model. The meaning of 1 is the impact of a 1 unit increase in X1,
on Y, holding constant the other variables.
Once an equation is decided upon, it must be quantified. This is called the estimated regression
equation. The real-world values of X and Y are used to calculate coefficient estimates, and these
are in turn used to determine Y^ – estimated/fitted value of Y – represents the value of Y
calculated from the estimated regression equation for the ith observation. The closer these Y^
are to the Ys in the sample, the better the fit. The difference is the residual:
e i=Y i−Y^ i
,The residual is the difference between the observed Y and the estimated regression line Y^ , but
the error term is the difference between the observed Y and the actual regression equation.
Residual can be calculated, error term cannot – not observable. When the equation is
estimated, the hats are used on top of variables.
A dataset is called cross-sectional because all of the observations are from the same point in
time and represent different individual economic entities from that same point in time.
Chapter 2
The most widely known method of obtaining these estimates is the Ordinary Least Squares
(OLS), which is a regression estimation technique that calculated all ^β s so as to minimize the
sum of the squared residuals.
1. Easy to use
2. Goal of minimizing residuals is appropriate from a theoretical point of view
3. They have a number of useful characteristics:
- Sum of residuals is 0
- Best estimator under a set of specific assumptions
An estimator is a mathematical technique that is applied to a sample of data to produce a real-
world numerical estimate of the true population regression coefficient. OLS is the estimator,
the ^β is the estimate.
(Y ¿ ¿ i−Ý )
β 1=( X ¿ ¿ i− X́ ) 2
¿¿
(X ¿ ¿ i− X́) ¿
β 0=Ý −β 1 X́
A multivariate regression coefficient indicates the change in the dependent variable associated
with a one-unit increase in the independent variable in question, holding constant the other
independent variables in the equation.
Econometricians use the squared variations of Y around its mean as a measure of the amount
of variation to be explained by the regression. This computed quantity is usually called the total
sum of squares, or TSS:
N
2
TSS=∑ (Y ¿ ¿i−Ý ) ¿
i=1
For OLS the, TSS has two components, variation that can be explained by the regression and
variation that cannot. So TSS = explained sum of squares (ESS) + residual sum of squares (RSS) -
> decomposition of variance. The ESS is attributable to the fitted regression line. The.
Unexplained part is not, called the RSS.
The simplest commonly used measure of fit is R2, or the coefficient of determination. This is the
ratio of the explained sum of squares to the total sum of squares
, ESS RSS
R 2= =1−
TSS TSS
The higher the ratio, the closer the estimated regression equation fits the sample data. It
measures the percentage of the variation of Y around the mean Ý that is explained by the
regression equation.
An extra variable will always increase the R2 because the RSS will be reduced. It requires an
estimation of a new coefficient, which in turn lessens the degree of freedom, or the excess of
number of observations (N) over the number of coefficients estimated (K+1). The lower this
freedom, the less reliable the estimates. The Ŕ2 is this number adjusted for freedom
2 ∑ e2i /( N−K −1)
Ŕ =1−
∑ (Y ¿ ¿ i−Ý )2 /(N −1) ¿
An increase indicates that the marginal benefit of adding a new variable exceeds the cost, and
the other way around. Highest is 1.00, lowest can be slightly negative.
Chapter 3
A dummy variable takes on the value of one or zero (and only those) depending on whether a
specified condition is met. The coefficient of a dummy indicates the additional unit of
dependent variable when the dummy is 1 or 0 (more salary with or without degree).
DUMMY TRAP!
Chapter 4
The OLS assumptions are as follows:
1. The regression model is linear in the coefficients and in the error term
the model might be linear, but the underlying theory does not have to be. In case of an
exponential function, the log of both sides of the equation has to be taken to end up with this.
2. The error term has a population mean of zero
As long as there is a constant term in the equation, the estimate of 0 will absorb the nonzero
mean.
3. All explanatory variables are uncorrelated with the error term
The error term is independent from all other variables, so it does not change in the meantime.
4. Observations of the error term are uncorrelated with each other/no serial correlation
Increases in one time period should not show up in or affect in any way the error term in
another time period. If the error terms are correlated, the assumption is violated
5. The error term has a constant variance/no heteroskedasticity
The observations of the error term should have continually be drawn from identical
distributions.
6. No explanatory variable is a perfect linear function of any other explanatory
variables/no perfect multicollinearity
Perfect collinearity between two independent variables implies that they are really the same
variable, or that one is a multiple of the other. Many instances of this are the result of not
accounting for identities.