Basic Concepts
1 PURPOSE REGRESSION ANALYSIS
Regression analysis is concerned with the study of the dependence of one variable, the dependent
variable, on one or plural variables, the explanatory variables, with a view to estimating and/or
predicting the population mean or average value of the former in terms of known or fixed (in
repeated sampling) values of the latter.
2 THE POPULATION REGRESSION FUNCTION (PRF)
The population regression curve = the locus of the conditional expectations of the dependent
variable for fixed values of the independent variable.
→In principle: population ∞ large, i.e. for each value of X we have an ∞ number of observations on Y
Mathematical specification: 𝐸(𝑌|𝑋𝑖 ) = 𝑓(𝑋𝑖 )
Linear population regression function (PRF): 𝐸(𝑌|𝑋𝑖 ) = 𝛽1 + 𝛽2 𝑋𝑖
2.1 INTERPRETATION ‘LINEAR’
Two alternative interpretations:
• Linear in the variables
o The conditional expectations of Y are a linear function of Xi
NOT: 𝐸(𝑌|𝑋𝑖 ) = 𝛽1 + 𝛽2 𝑋𝑖 ²
• Linear in the parameters
o The conditional expectations of Y are a linear function of βs
NOT: 𝐸(𝑌|𝑋𝑖 ) = 𝛽1 + √𝛽2 𝑋𝑖
The basic theory of regression analysis supposes linearity in the parameters!
Non-linearity in the variables is permitted:
Econometrics 2019-2020 Casier Tessa
1
,2.2 STOCHASTIC SPECIFICATION OF THE PRF
Population regression function is only on average correct
The deviations of the individual consumption expenditures Yi of their conditional expectation can be
represented as: µ𝑖 = 𝑌𝑖 − 𝐸(𝑌|𝑋𝑖 ) with µi the stochastic error term, hence:
2.3 SOURCE OF THE STOCHASTIC ERROR TERM
The error term = a collection of all variables/factors that affect Y but are not included in the model.
Possible explanations:
• Vague theory
• No proper data available
• Simplicity: other variables only have a marginal (and random) influence
• Measurement errors in the data
• Wrong functional form
• …
(Later: properties error terms determine properties estimators!!!)
3 THE SAMPLE REGRESSION FUNCTION (SRF)
Assumption: we have data for the entire population
• Parameters in the population regression function can simply be calculated (βs)
• No need for estimation methods (econometrics)
In practice: we typically only have a sample drawn from the population
• ‘Randomly’ = for each X-value we draw exactly one Y-value
• Let n denote the sample size
The mission is to reconstruct the population regression curve/function based on this sample.
We do this based on:
• The sample regression curve
• The sample regression function (SRF)
𝑌 ̂1 + 𝛽
̂𝑖 = 𝛽 ̂2 𝑋𝑖 where: ̂𝑖
𝑌 an estimator (E|Xi)
̂1
𝛽 an estimator 𝛽1
̂2
𝛽 an estimator 𝛽2
based on an estimator, which
will be determined later.
Econometrics 2019-2020 Casier Tessa
2
,3.1 TERMINOLOGY: ESTIMATOR VS. ESTIMATION
An estimator = a method (typically based on a formula) to estimate a population parameter
using information in a sample of data
An estimation = A numerical result of the estimator applied to the available sample
3.2 THE SAMPLE REGRESSION FUNCTION
Using the sample, we can not exactly reconstruct the population regression function!!
Reasons:
• The sample regression function (SRF) is merely an approximation of the population
regression function (PRF)
o 𝛽 ̂1 ≠ 𝛽1
o 𝛽 ̂2 ≠ 𝛽2
o 𝑌 ̂𝑖 ≠ 𝑌𝑖
o µ̂𝑖 ≠ µ𝑖
• An estimator is stochastic (= it varies over repeated sampling)
o An alternative sample results in a different sample regression function
3.3 PURPOSE REGRESSION ANALYSIS
Approximate the parameters of the population regression function 𝑌𝑖 = 𝛽1 + 𝛽2 𝑋𝑖 + µ𝑖
̂ ̂ ̂
using the sample regression function 𝑌𝑖 = 𝛽1 + 𝛽2 𝑋𝑖 + µ̂𝑖
̂1 and 𝛽
and make sure that 𝛽 ̂2 approximate 𝛽1 and 𝛽2 ‘as closely as possible’ even though we do not
know the population regression function.
Use an estimator (least squares method)
The performance of the estimator as an approximation will be formalised by the statistical properties
of this estimator.
Econometrics 2019-2020 Casier Tessa
3
, Estimating the Sample Regression
Function
1 THE ORDINARY LEAST SQUARES METHOD (OLS)
How do we estimate the sample regression function based on the sample data?
• Pragmatic approach: find a sample regression line such that the distance between this line
and the observed data points become as small as possible
o Minimize the distance between 𝑌 ̂𝑖 and 𝑌𝑖
• Possible criteria:
o min ∑ µ̂𝑖 : not possible (- and + error terms cancel out)
̂1 ,𝛽
𝛽 ̂2
o min ∑ |µ̂
𝑖 |: possible (less interesting properties)
̂1 ,𝛽
𝛽 ̂2
o min ∑ µ̂𝑖 ²: ordinary least squares (OLS) method
̂1 ,𝛽
𝛽 ̂2
1.1 SOLUTION MINIMIZATION PROBLEM
From
𝑛
̂1 − 𝛽
min ∑(𝑌𝑖 − 𝛽 ̂2 𝑋𝑖 )²
̂1 ,𝛽
𝛽 ̂2
𝑖=1
This is a system of 2 equations and 2 unknowns
̂1 and 𝛽
Hence 𝛽 ̂2 are identified
̂𝟐
OLS estimator 𝜷 on formula sheet
∑ 𝑥𝑖 𝑦𝑖
̂2 =
𝛽
∑ 𝑥𝑖2
where
1
𝑋̅ = ∑ 𝑋𝑖 and xi = Xi - 𝑋̅
2
1
𝑌̅ = ∑ 𝑌𝑖 and yi = Yi - 𝑌̅
2
̂𝟏
OLS estimator 𝜷 on formula sheet
̂1 = 𝑌̅ − 𝛽
𝛽 ̂2 𝑋̅
Econometrics 2019-2020 Casier Tessa
4
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper tessacasier. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €5,49. Je zit daarna nergens aan vast.