100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Econometrics for BE Samenvatting R117,84   Add to cart

Summary

Econometrics for BE Samenvatting

1 review
 54 views  8 purchases
  • Course
  • Institution
  • Book

In this document you will find a summary for the course “Econometrics for BE” taught in the first year of the Bachelor's degree in Economics and Business Economics, and in the pre-master of Finance at the University of Groningen. The summary contains a comprehensive view of all the theory in th...

[Show more]

Preview 10 out of 52  pages

  • No
  • 1 t/m 11, 13 t/m 15
  • March 26, 2022
  • 52
  • 2021/2022
  • Summary

1  review

review-writer-avatar

By: jobwildenbeest1 • 1 year ago

avatar-seller
ECONOMETRICS FOR BE




Pre-Master Finance
2021-2022

0

,Table of Contents
Week 1) Simple Linear Regression ................................................................................................................... 3
Chapter 1) The Nature of Econometrics and Economic Data ............................................................................. 3
1.1. What is Econometrics? ........................................................................................................................... 3
1.2. Steps in Empirical Economic Analysis ..................................................................................................... 3
1.3. The Structure of Economic Data ............................................................................................................. 4
1.4. Causality, Ceteris Paribus, and Counterfactual Reasoning ..................................................................... 5
Chapter 2) The Simple Regression Model .......................................................................................................... 5
2.1. Definition of the Simple Regression Model ............................................................................................ 5
2.2. Deriving the Ordinary Least Squares Estimates ...................................................................................... 6
2.3. Properties of OLS on Any Sample of Data ............................................................................................... 8
2.4. Units of Measurement and Functional Form .......................................................................................... 9
2.5. Expected Values and Variances of the OLS Estimators ........................................................................... 9
2.6. Regression through the Origin and Regression on a Constant ............................................................. 10
2.7. Regression on a Binary Explanatory Variable ....................................................................................... 11
Week 2) Multiple Regression Analysis ........................................................................................................... 12
Chapter 3) Multiple Regression Analysis: Estimation....................................................................................... 12
3.1. Motivation for Multiple Regression ...................................................................................................... 12
3.2. Mechanics and Interpretation of Ordinary Least Squares .................................................................... 12
3.3. The Expected Value of the OLS Estimators ........................................................................................... 14
3.4. The Variance of the OLS Estimators ...................................................................................................... 15
3.5. Efficiency of OLS: The Gauss-Markov Theorem .................................................................................... 17
3.6. Some Comments on the Language of Multiple Regression Analysis .................................................... 18
Chapter 6) Multiple Regression Analysis Further Issues (only 6.3) .................................................................. 18
6.3. More on Goodness-of-Fit and Selection of Regressors......................................................................... 18
Week 3) Statistical Inference ......................................................................................................................... 20
Chapter 4) Multiple Regression Analysis: Inference ........................................................................................ 20
4.1. Sampling Distributions of the OLS Estimators ...................................................................................... 20
4.2. Testing Hypotheses about a Single Population Parameter: The t Test ................................................. 20
4.3. Confidence Intervals ............................................................................................................................. 22
4.4. Testing Hypotheses about a Single Linear Combination of the Parameters ......................................... 22
4.5. Testing Multiple Linear Restrictions: The F Test ................................................................................... 23
Week 4) Regression with Qualitative Information and Heteroskedasticity .................................................... 25
Chapter 7) Multiple Regression Analysis with Qualitative Information ........................................................... 25
7.1. Describing Qualitative Information ....................................................................................................... 25
7.2. A Single Dummy Independent .............................................................................................................. 25
7.3. Using Dummy Variables for Multiple Categories .................................................................................. 25
7.4. Interactions Involving Dummy Variables .............................................................................................. 26


1

, 7.5. A Binary Dependent Variable: The Linear Probability Model ............................................................... 26
7.6. More on Policy Analysis and Program Evaluation ................................................................................. 26
7.7. Interpreting Regression Results with Discrete Dependent Variables ................................................... 27
Chapter 8) Heteroskedasticity .......................................................................................................................... 27
8.1. Consequences of Heteroskedasticity for OLS ....................................................................................... 27
8.3. Testing for Heteroskedasticity .............................................................................................................. 27
Week 5) Endogeneity Part I ........................................................................................................................... 29
Chapter 5) Multiple Regression Analysis: OLS Asymptotics ............................................................................. 29
5.1. Consistency ........................................................................................................................................... 29
Chapter 9) More on Specification and Data Issues .......................................................................................... 30
9.2. using Proxy Variables for Unobserved Explanatory Variables .............................................................. 30
9.4. Properties of OLS under Measurement Error ....................................................................................... 31
Chapter 15) Instrumental Variables Estimation and Two-Stage Least Squares ............................................... 33
15.1. Motivation: Omitted Variables in a Simple Regression Model ........................................................... 33
15.2. IV Estimation of the Multiple Regression Model ................................................................................ 34
15.3. Two-Stage Least Squares .................................................................................................................... 35
15.4. IV Solutions to Errors-in-Variables Problems ...................................................................................... 37
15.5. Testing for Endogeneity and Testing Overidentifying Restrictions ..................................................... 37
Week 6) Endogeneity Part II and Panel Data .................................................................................................. 39
Chapter 13) Pooling Cross Sections across Time: Simple Panel Data Methods ............................................... 39
13.1. Pooling Independent Cross Sections across Time ............................................................................... 39
13.2. Policy Analysis with Pooled Cross Sections ......................................................................................... 39
13.3. Two-Period Panel Data Analysis.......................................................................................................... 41
13.4. Policy Analysis with Two-Period Panel Data ....................................................................................... 42
Chapter 14) Advanced Panel Data Methods .................................................................................................... 43
14.1. Fixed Effects Estimation ...................................................................................................................... 43
Week 7) Introduction in Time Series .............................................................................................................. 45
Chapter 10) Basic Regression Analysis with Time Series Data ......................................................................... 45
10.1. The Nature of Time Series Data .......................................................................................................... 45
10.2. Examples of Time Series Regression Models ...................................................................................... 45
10.3. Finite Sample Properties of OLS under Classical Assumptions ........................................................... 46
10.4. Functional Form, Dummy Variables, and Index Numbers .................................................................. 48
10.5. Trends and Seasonality (until 10-5b) .................................................................................................. 48
Chapter 11) Further Issues in Using OLS with Time Series Data ...................................................................... 50
11.3. Using Highly Persistent Time Series in Regression Analysis ................................................................ 50




2

,Week 1) Simple Linear Regression
Chapter 1) The Nature of Econometrics and Economic Data
1.1. What is Econometrics?
• Econometrics is based upon the development of statistical methods for estimating economic
relationships, testing economic theories, and evaluating and implementing government and
business policy. A common application of econometrics is the forecasting of such important
macroeconomic variables as interest rates, inflation rates, and gross domestic product (GDP).
• Econometrics is the use of statistical methods using quantitative data to develop theories or test
existing hypotheses in economics or finance.
• Econometrics is a separate discipline from mathematical statistics that focuses on the problems
inherent in collecting and analysing nonexperimental economic data.
o Nonexperimental data (or observational/retrospective data) are not accumulated
through controlled experiments on individuals, firms, or segments of the economy. The
researcher is a passive collector of the data. Researchers measure variables as they
naturally occur without any further manipulation.
o Experimental data are often collected in laboratory environments in the natural sciences,
but they are more difficult to obtain in the social sciences. Experimental data are collected
through active intervention by the researcher to produce and measure change or to
create difference when a variable is altered.

1.2. Steps in Empirical Economic Analysis
• An empirical analysis uses data to test a theory or to estimate a relationship.
• The first step is the careful formulation of the question of interest. The question might deal with
testing a certain aspect of an economic theory, or it might pertain to testing the effects of a
government policy. Econometric methods can be used to answer a wide range of questions.
• In some cases, a formal economic model is constructed, which consists of mathematical equations
that describe various relationships (a simplified description of reality).
o Formal economic modelling is sometimes the starting point for empirical analysis, but it is
more common to use economic theory less formally, or to even rely entirely on intuition.
• After we specify an economic model, we need to turn it into what we call an econometric model.
o The form of the function 𝑓(∙) must be specified before we can undertake an econometric
analysis.
o The ambiguities in the economic model must be resolved by specifying the variables into
observable variables. 𝑓(∙) = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑛 𝑥𝑛 + 𝑢
▪ The choice of these variables is determined by the economic theory as well as
data considerations.
▪ 𝑢 contains unobserved factors and errors in measuring the variables, and can be
called error term or disturbance term.
▪ The constants 𝛽0 , 𝛽1 , … , 𝛽𝑛 are the parameters of the econometric model, and
describe the directions and strengths of the relationship between the variable and
the factors used to determine the variable in the model.
• Once an econometric model has been specified, various hypotheses of interest can be stated in
terms of the unknown parameters.
• An empirical analysis, by definition, requires data. After data on the relevant variables have been
collected, econometric methods are used to estimate the parameters in the econometric model
and to formally test hypotheses of interest. In some cases, the econometric model is used to make
predictions in either the testing of a theory or the study of a policy’s impact.

3

,1.3. The Structure of Economic Data
Cross-Sectional Data
• A cross-sectional data set consists of a sample of individuals, households, firms, cities, states,
countries, or a variety of other units, taken at a given point in time.
• We can often assume that cross-sectional data have been obtained by random sampling from the
underlying population.
o A violation of random sampling occurs when we sample from units that are large relative
to the population → the population is not large enough to reasonably assume the
observations are independent draws.
o Another problem arises when certain groups of the population would not participate in
the research, hence the sample does not reflect the whole population.
• All econometrics and statistics software packages assign an observation number to each data unit.
The ordering of the data does not matter for econometric analysis of cross-sectional data sets
obtained from random sampling.

Time Series Data
• A time series data set consists of observations on a variable or several variables over time. Unlike
the arrangement of cross-sectional data, the chronological ordering of observations in a time
series conveys potentially important information.
• A key feature of time series data that makes them more difficult to analyse than cross-sectional
data is that economic observations can rarely, if ever, be assumed to be independent across time.
• Another feature of time series data that can require special attention is the data frequency at
which the data are collected. In economics, the most common frequencies are daily, weekly,
monthly, quarterly, and annually.
• When econometric methods are used to analyse time series data, the data should be stored in
chronological order.

Pooled Cross Sections
• Some data sets have both cross-sectional and time series features. To increase our sample size,
we can form a pooled cross section by combining the cross-sectional data sets from multiple
years.
• Pooling cross sections from different years is often an effective way of analysing the effects of a
new government policy → collect data from the years before and after a key policy change.
• A pooled cross section is analysed much like a standard cross section, except that we often need
to account for secular differences in the variables across the time. In fact, in addition to increasing
the sample size, the point of a pooled cross-sectional analysis is often to see how a key relationship
has changed over time.

Panel or Longitudinal Data
• A panel data (or longitudinal data) set consists of a time series for each cross-sectional member
in the data set.
• The key feature of panel data that distinguishes them from a pooled cross section is that the same
cross-sectional units are followed over a given time period. Having multiple observations on the
same units allows us to control for certain unobserved characteristics of individuals, firms, etc.
• A second advantage of panel data is that they often allow us to study the importance of lags in
behaviour or the results of decision making.




4

,1.4. Causality, Ceteris Paribus, and Counterfactual Reasoning
• In most tests of economic theory, and certainly for evaluating public policy, the economist’s goal
is to infer that one variable has a causal effect on another variable.
• The notion of ceteris paribus – which means “other (relevant) factors being equal” – plays an
important role in causal analysis.
• Counterfactual reasoning → an economic unit, such as an individual or a firm, is imagined in two
or more different states of the world. By considering counterfactual outcomes (or potential
outcomes), we easily “hold other factors fixed” because the counterfactual thought experiment
applies to each individual separately.

Chapter 2) The Simple Regression Model
2.1. Definition of the Simple Regression Model
• The simple regression model can be used to study the relationship between two variables.
• Much of applied econometric analysis begins with the following premise: y and x are two variables,
representing some population, and we are interested in “explaining y in terms of x,” or in “studying
how y varies with changes in x.”
• The simple linear regression model (or bivariate linear regression model) is an equation that is
assumed to hold in the population of interest, such as: 𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝑢
• When related by an equation, the variables y and x have several different names used
interchangeably:
o y is called the dependent variable, the explained variable, the response variable, the
predicted variable, or the regressand.
o x is called the independent variable, the explanatory variable, the control variable, the
predictor variable, or the regressor.
• The variable 𝑢, called the error term or disturbance in the relationship, represents factors other
than x that affect y. A simple regression analysis effectively treats all factors affecting y other than
x as being unobserved.
o ∆𝑦 = 𝛽1 ∆𝑥 if ∆𝑢 = 0
o The change in y is simply 𝛽1 multiplied by the change in x. This means that 𝛽1 is the slope
parameter in the relationship between y and x, holding the other factors in 𝑢 fixed.
o 𝛽0 is the intercept parameter, but is rarely central to an analysis.
• The most difficult issue to address is whether the model really allows us to draw ceteris paribus
conclusions about how x affects y.
o We are only able to get reliable estimators of 𝛽0 and 𝛽1 from a random sample of data
when we make an assumption restricting how the unobservable 𝑢 is related to the
explanatory variable 𝑥. Without such a restriction, we will not be able to estimate the
ceteris paribus effect, 𝛽1 .
o As long as the intercept 𝛽0 is included in the equation, nothing is lost by assuming that
the average value of 𝑢 in the population is zero: 𝐸(𝑢) = 0.
• A natural measure of the association between two random variables is the correlation coefficient.
If 𝑢 and 𝑥 are uncorrelated, as random variables, they are not linearly related. Correlation
measures only linear dependence between 𝑢 and 𝑥. However, it has a somewhat counterintuitive
feature: it is possible for 𝑢 to be uncorrelated with 𝑥 while being correlated with functions of 𝑥,
such as 𝑥 2 . This possibility, however, causes problems for interpreting the model. A better
assumption involves the expected value of 𝑢 given 𝑥.
• Because 𝑢 and 𝑥 are random variables, we can define the conditional distribution of 𝑢 given any
value of 𝑥. In particular, for any 𝑥, we can obtain the expected value of 𝑢 for that slice of the


5

, population described by the value of 𝑥. The crucial assumption is that the average value of 𝑢 does
not depend on the value of 𝑥: 𝐸(𝑢|𝑥) = 𝐸(𝑢) → the average value of the unobservables is the
same across all slices of the population determined by the value of 𝑥 and that the common
average is necessarily equal to the average of 𝑢 over the entire population.
o 𝑢 is mean independent of 𝑥.
• We can obtain the zero conditional mean assumption → 𝐸(𝑢|𝑥) = 0.
• The population regression function (PRF): 𝐸(𝑦|𝑥) = 𝛽0 + 𝛽1 𝑥. This is a linear function of 𝑥.
o The linearity means that a one-unit increase in 𝑥 changes the expected value of y by the
amount of 𝛽1 .
o For any given value of 𝑥, the distribution of 𝑦 is centred about 𝐸(𝑦|𝑥).
o The equation tells us how the average value of 𝑦 changes with 𝑥; it does not say that 𝑦
equals 𝛽0 + 𝛽1 𝑥 for all units in the population.
• It is useful to view the equation 𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝑢 as breaking 𝑦 into two components:
o The piece 𝛽0 + 𝛽1 𝑥, which represents 𝐸(𝑦|𝑥), is called the systematic part of y (the part
of y explained by x)
o 𝑢 is called the unsystematic part (the part of y not explained by x).

2.2. Deriving the Ordinary Least Squares Estimates
• Let {(𝑥𝑖 , 𝑦𝑖 ): 𝑖 = 1, … , 𝑛} denote a random sample of size 𝑛 from the population. We can write
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝑢𝑖 for each 𝑖.
o Here, 𝑢𝑖 is the error term for observation i because it contains all factors affecting 𝑦𝑖 other
than 𝑥𝑖 .
• In the population, 𝑢 is uncorrelated with 𝑥. Therefore, we see that 𝑢 has zero expected value and
that the covariance between 𝑥 and 𝑢 is zero:
o E(𝑢) = 0
o Cov(𝑥, 𝑢) = E(𝑥𝑢) = 0
• In terms of the observable variances 𝑥 and 𝑦 and the unknown parameters 𝛽0 and 𝛽1 , the
equations can be written as:
o E(𝑦 − 𝛽0 − 𝛽1 𝑥) = 0
o E[𝑥(𝑦 − 𝛽0 − 𝛽1 𝑥)] = 0
• Because there are two unknown parameters to estimate (𝛽0 and 𝛽1 ), the equations above can be
used to obtain good estimators of 𝛽0 and 𝛽1 . Given a sample of data, we choose estimates 𝛽̂0 and
𝛽̂1 to solve the sample counterparts of the equations above:
1. 𝑛−1 ∑𝑛𝑖=1(𝑦𝑖 − 𝛽̂0 − 𝛽̂1 𝑥𝑖 ) = 0
2. 𝑛−1 ∑𝑛𝑖=1 𝑥𝑖 (𝑦𝑖 − 𝛽̂0 − 𝛽̂1 𝑥𝑖 ) = 0
▪ This is an example of the methods of moments approach to estimation. These
equations can be solved for 𝛽̂0 and 𝛽̂1 .
• The first equation can be rewritten as:
o 𝑦̅ = 𝛽̂0 + 𝛽̂1 𝑥̅ , where 𝑦̅ = 𝑛−1 ∑𝑛𝑖=1 𝑦𝑖
• This gives us the following equation: 𝛽̂0 = 𝑦̅ − 𝛽̂1 𝑥̅
o Therefore, once we have the slope estimate 𝛽̂1 , it is straightforward to obtain the
intercept estimate 𝛽̂0 , given 𝑦̅ and 𝑥̅ .
• Putting this in the second equation and dropping 𝑛−1 gives:
o ∑𝑛𝑖=1 𝑥𝑖 [𝑦𝑖 − (𝑦̅ − 𝛽̂1 𝑥̅ ) − 𝛽̂1 𝑥𝑖 ] = 0
• Rearrangement gives:
o ∑𝑛𝑖=1 𝑥𝑖 (𝑦𝑖 − 𝑦̅) = 𝛽̂1 ∑𝑛𝑖=1 𝑥𝑖 (𝑥𝑖 − 𝑥̅ )



6

,Basic properties of summation operator
∑𝑛𝑖=1 𝑥𝑖 (𝑥𝑖 − 𝑥̅ ) = ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2

∑𝑛𝑖=1 𝑥𝑖 (𝑦𝑖 − 𝑦̅) = ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)
𝑛
∑ (𝑥𝑖 −𝑥̅ )(𝑦𝑖 −𝑦̅)
• Hence, provided that ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 > 0, the estimated slope is 𝛽̂1 = 𝑖=1
∑𝑛 (𝑥 −𝑥̅ )2
𝑖=1 𝑖
o This equation is the sample covariance between 𝑥𝑖 and 𝑦𝑖 divided by the sample variance
of 𝑥𝑖 .
̂
𝜎
o Using simple algebra we can also write 𝛽̂1 = 𝜌̂𝑥𝑦 ∙ ( 𝑦 ) ̂𝑥
𝜎
▪ 𝜌̂𝑥𝑦 is the sample correlation between 𝑥𝑖 and 𝑦𝑖 and 𝜎̂𝑥 , 𝜎̂𝑦 denote the sample
standard deviations.
▪ An immediate implication is that if 𝑥𝑖 and 𝑦𝑖 are positively correlated in the
sample then 𝛽̂1 > 0; if 𝑥𝑖 and 𝑦𝑖 are negatively correlated then 𝛽̂1 < 0.
• The formula for 𝛽̂1 in terms of the sample correlation and sample standard deviations is the
𝜎𝑦
sample analog of the population relationship 𝛽1 = 𝜌𝑥𝑦 ∙ (𝜎 ), where all quantities are defined for
𝑥
the entire population.
• In effect, simple regression is an analysis of correlation between two variables, and so one must
be careful in inferring causality.
𝑛
∑ (𝑥𝑖 −𝑥̅ )(𝑦𝑖 −𝑦̅)
• The estimates 𝛽̂0 = 𝑦̅ − 𝛽̂1 𝑥̅ and 𝛽̂1 = 𝑖=1
∑𝑛 (𝑥 −𝑥̅ )2
are called ordinary least squares (OLS)
𝑖=1 𝑖

estimates of 𝛽̂0 and 𝛽̂1 . To justify this name, for any 𝛽̂0 and 𝛽̂1 define a fitted value for 𝑦 when
𝑥 = 𝑥𝑖 as 𝑦̂𝑖 = 𝛽̂0 + 𝛽̂1 𝑥𝑖 .
o This is the value we predict for 𝑦 when 𝑥 = 𝑥𝑖 for the given intercept and slope. There is
a fitted value for each observation in the sample.
• The residual for observation 𝑖 is the difference between the actual 𝑦𝑖 and its fitted value:
o 𝑢̂𝑖 = 𝑦𝑖 − 𝑦̂𝑖 = 𝑦𝑖 − 𝛽̂0 + 𝛽̂1 𝑥𝑖
o There are n such residuals.
• Now, suppose we choose 𝛽̂0 and 𝛽̂1 to make the sum of squares residuals, ∑𝑛𝑖=1 𝑢̂𝑖2 =
2
∑𝑛𝑖=1(𝑦𝑖 − 𝛽̂0 − 𝛽̂1 𝑥𝑖 ) , as small as possible. The first order conditions for (𝛽̂0 , 𝛽̂1 ) to minimize:
o ∑𝑛𝑖=1(𝑦𝑖 − 𝛽̂0 − 𝛽̂1 𝑥𝑖 ) = 0
o ∑𝑛𝑖=1 𝑥𝑖 (𝑦𝑖 − 𝛽̂0 − 𝛽̂1 𝑥𝑖 ) = 0
• Once we have determined the OLS intercept and slope estimates, we form the OLS regression
line: 𝑦̂ = 𝛽̂0 + 𝛽̂1 𝑥.
o 𝛽̂0 is the predicted value of 𝑦 when 𝑥 = 0.
• When using this equation to compute predicted values of 𝑦 for various values of 𝑥, we must
account for the intercept in the calculation. The equation 𝑦̂ = 𝛽̂0 + 𝛽̂1 𝑥 is also called the sample
regression function (SRF) because it is the estimated version of the population regression function
E(𝑦|𝑥) = 𝛽0 + 𝛽1 𝑥.
o It is important to remember that the PRF is something fixed, but unknown, in the
population. Because the SRF is obtained for a given sample of data, a new sample will
generate a different slope and intercept.
∆𝑦̂
o In most cases, the slope estimate (𝛽̂1 = ) is of primary interest as it tells us the amount
∆𝑥
by which 𝑦̂ changes when 𝑥 increases by one unit. Equivalently, ∆𝑦̂ = 𝛽̂1 ∆𝑥.
• We can indicate that a regression has been run by saying that we run the regression of y on x, or
regress y on x. We always regress the dependent variable on the independent variable.

7

,2.3. Properties of OLS on Any Sample of Data
The following properties hold, by construction, for any sample of data.

Fitted Values and Residuals
• We assume that the intercept and slope estimates 𝛽̂0 and 𝛽̂1 , have been obtained for the given
sample of data.
• We can obtain the fitted value of 𝑦̂𝑖 for each observation with SRF 𝑦̂ = 𝛽̂0 + 𝛽̂1 𝑥.
• Each fitted value of 𝑦̂𝑖 is on the OLS regression line.
• The OLS residual associated with observation 𝑖, 𝑢̂𝑖 , is the difference between 𝑦𝑖 and its fitted value.

Algebraic Properties of OLS Statistics
• There are several useful algebraic properties of OLS estimates and their associated statistics:
1. The sum, and therefore the sample average of the OLS residuals, is zero.
▪ ∑𝑛𝑖=1 𝑢̂𝑖 = 0
▪ The residuals are defined by 𝑢̂𝑖 = 𝑦𝑖 − 𝛽̂0 − 𝛽̂1 𝑥𝑖 , in which the OLS estimates are
chosen to make the residuals add up to zero.
2. The sample covariance between the regressors and the OLS residuals is zero. This follows
from the first order condition, which can be written in terms of the residuals as
∑𝑛𝑖=1 𝑥𝑖 𝑢̂𝑖 = 0
3. The point (𝑥̅ , 𝑦̅) is always on the OLS regression line.
• Writing each 𝑦𝑖 as its fitted value, plus its residual, provides another way to interpret an OLS
regression. For each 𝑖, write 𝑦𝑖 = 𝑦̂𝑖 + 𝑢̂𝑖 .
• The total sum of squares (SST) is a measure of the total sample variation in the 𝑦𝑖 ; it measures
how spread out the 𝑦𝑖 are in the sample.
o SST ≡ ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2
• The explained sum of squares (SSE) measures the sample variation in the 𝑦̂𝑖 (where we use the
fact that 𝑦̅̂ = 𝑦̅).
o SSE ≡ ∑𝑛𝑖=1(𝑦̂𝑖 − 𝑦̅)2
• The residual sum of squares (SSR) measures the sample variation in the 𝑢̂𝑖
o SSR ≡ ∑𝑛𝑖=1 𝑢̂𝑖2
• The total variation in y can always be expressed as the sum of the explained variation and the
unexplained variation → SST = SSE + SSR.
o ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2 = ∑𝑛𝑖=1[(𝑦𝑖 − 𝑦̂𝑖 ) + (𝑦̂𝑖 − 𝑦̅)]2 = ∑𝑛𝑖=1[𝑢̂𝑖 + (𝑦̂𝑖 − 𝑦̅)]2 = ∑𝑛𝑖=1 𝑢̂𝑖2 +
2
2 ∑𝑛𝑖=1 𝑢̂𝑖 (𝑦̂𝑖 − 𝑦̅) + ∑𝑛𝑖=1(𝑦̂𝑖 − 𝑦̅) = SSR + 2 ∑𝑛𝑖=1 𝑢̂𝑖 (𝑦̂𝑖 − 𝑦̅) + SSE
o This equation holds if we show that ∑𝑛𝑖=1 𝑢̂𝑖 (𝑦̂𝑖 − 𝑦̅) = 0, which is the case because we
have already claimed that the sample covariance between the residuals and the fitted
values is zero.

Goodness-of-Fit
• Assuming that the total sum of squares (SST) is not equal to zero – which is true except in the very
unlikely event that all the 𝑦𝑖 equal the same value – we can divide SST = SSE + SSR by SST to get
SSE SSR
1 = SST + SST.
• The R-squared of the regression, sometimes called the coefficient of determination, is defined as
SSE SSR
𝑅 2 ≡ SST = 1 − SST
o 𝑅 2 is the ratio of the explained variation compared to the total variation; it is interpreted
as the fraction of the sample variation in y that is explained by x.
o The value of 𝑅 2 is always between zero and one, because SSE cannot be greater than SST.


8

, 2.4. Units of Measurement and Functional Form
• Two important issues in applied economics are:
o Understanding how changing the units of measurement of the dependent and/or
independent variables affects OLS estimates.
o Knowing how to incorporate popular functional forms used in economics into regression
analysis.
• OLS estimates change in entirely expected ways when the units of measurement of the dependent
and independent variables change.
o If the dependent variable is multiplied by the constant c – which means each value in the
sample is multiplied by c – the OLS intercept and the slope estimates are also multiplied
by c.
o If the independent variable is divided (multiplied) by some nonzero constant, c, then the
OLS slope coefficient is multiplied (divided) by c.
• In reading applied work in the social sciences, you will often encounter regression equations
where the dependent variable appears in logarithmic form. The dependent variable is defined as
𝑦 = log (∙). The mechanisms of OLS, the intercept and slope estimates, are the same as before.
o Constant elasticity model → 𝛽1 is the elasticity of y with respect to x (in log-log model; in
the log-level model, 100 ∙ 𝛽1 is the semi-elasticity of y with respect to x).
o If the units of measurement of the dependent variable change, the slope is still 𝛽1 , but the
intercept is now log(𝑐1 ) + 𝛽0 .
▪ log(𝑐1 ) + log(𝑦𝑖 ) = [log (𝑐1 ) + 𝛽0 ] + 𝛽1 𝑥𝑖 + 𝑢𝑖
• log(𝑐1 ) + log(𝑦𝑖 ) can also be written as log(𝑐1 𝑦𝑖 )
o If the independent variable is log (𝑥), and we change the units of measurement of x
before taking the log, the slope remains the same, but the intercept changes.

Table 2.3. Summary of Functional Forms Involving Logarithms
Model Dependent Variable Independent Variable Interpretation of 𝜷𝟏
Level-level 𝑦 𝑥 ∆𝑦 = 𝛽1 ∆𝑥
Level-log 𝑦 log(𝑥) 𝛽1
∆𝑦 = ( ) %∆𝑥
100
Log-level log(𝑦) 𝑥 %∆𝑦 = (100𝛽1 )∆𝑥
Log-log log(𝑦) log(𝑥) %∆𝑦 = 𝛽1 %∆𝑥

2.5. Expected Values and Variances of the OLS Estimators
Unbiasedness of OLS
• Assumption SLR 1: Linear in Parameters
o In the population model, the dependent variable (y) is related to the independent variable
(x) and the error (u) as 𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝑢, where 𝛽0 and 𝛽1 are the population intercept
and slope parameters.
• Assumption SLR 2: Random Sampling
o We have a random sample of size 𝑛, {(𝑥𝑖 , 𝑦𝑖 ): 𝑖 = 1,2, … , 𝑛}, following the population
model in the equation 𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝑢.
• Assumption SLR 3: Sample Variations in the Explanatory Variable
o The sample outcomes on x, namely {𝑥𝑖 , 𝑖 = 1, … , 𝑛}, are not all the same value.
o If the sample standard deviation of 𝑥𝑖 is zero, then Assumption SLR3 fails; otherwise, it
holds.
• Assumption SLR 4: Zero Conditional Mean



9

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through EFT, credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller ajakkerman. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for R117,84. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

64438 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy summaries for 14 years now

Start selling
R117,84  8x  sold
  • (1)
  Buy now