Bivariate & Multivariate Linear Regression Model
Bivariate
Ordinary Least Squares (OLS)= Methodology that we use to estimate the coefficients
1. Take vertical distances (defined as ûi) between each point
2. Take the square of each distance and sum them
- Some distance can be positive or negative so that’s why we square it
3. Find the estimated coefficient α ^ and ^β that minimize the sum of the squared residuals
Population regression function: yi= α + βxi + ui
- Sample regression function: yi= α ^ + ^β xi + ûi
OLS properties:
1. Unbiasedness; on average α ^ and ^β are equal to the true values of 𝛼 and 𝛽
2. Efficiency; no other estimator has smaller Var(α ^ ) and Var( ^
β ¿¿
In order for the properties of the OLS estimator to hold, some important assumptions must
hold at the same time.
There are also large sample properties of OLS:
1. Consistency: the estimates will converge to the true values when N increases to infinity
2. Asymptotic Normality: estimates are normally distributed in large enough samples
Bivariate OLS assumptions:
1. Population model is linear in parameters
2. Random sample from the population
3. Sample variation in explanatory variable
4. The error u has an expected value of zero
5. Homoscedasticity: the variance of the error u is constant and finite for any value of the
explanatory variable x
6. Normality; the population error u is independent of the explanatory variables x and is
normally distributed -> only for hypothesis testing
These assumptions guarantee that the OLS estimators α ^ and ^β are the Best Linear
Unbiased Estimators (BLUE)
Best: have the minimum variance
Linear: linear estimators, linear combinations of y
Unbiased: if on average they are equal to the true values
Estimator: are estimators of the true values α and β
Standardized Coefficients are useful since there is no need to worry about units of measurement
and provide more meaningful economic interpretation.
- How much 1 SD of x increases y
Logarithm is used in regressions for various reasons:
1. Rescale the data so that their variance is more constant -> overcome heteroscedasticity
2. Positively skewed distribution closer to a normal distribution
3. Constant percentage increase -> means the coefficients can be interpreted as elasticities
o Log-log is also called elasticity of y with respect to x
o Semi-elasticity: level-log model
^ and ^β
Standard Error: estimate the precision of α
- Larger error variance σ² -> larger var -> less precise estimates
Three measures that help calculate the Goodness of Fit of the model:
1. Total Sum of Squares (TSS)= ESS + RSS
o Total variation in y
1