Econometrics Theory (from lectures & book)
Chapter 8
Non-linear regression functions
A nonlinear function is a function with a slope that is not constant: The function ƒ(X) is linear if the
slope of ƒ(X) is the same for all values of X, but if the slope depends on the value of X, then ƒ(X) is
nonlinear.
If a straight line is not an adequate description of the relationship between district income and
test scores, what is?
Quadratic population regression model
One way to approximate such a curve mathematically is to model the relationship as a quadratic
function. That is, we could model test scores as a function of income and the square of income.
A quadratic population regression model relating test scores and income is written mathematically
as:
𝑇𝑒𝑠𝑡𝑆𝑐𝑜𝑟𝑒𝑖 = 𝛽0 + 𝛽1 𝐼𝑛𝑐𝑜𝑚𝑒𝑖 + 𝛽2 𝐼𝑛𝑐𝑜𝑚𝑒𝑖2 + 𝑢𝑖
This equation is called the quadratic regression model because the population regression function,
𝐸(𝑇𝑒𝑠𝑡𝑆𝑐𝑜𝑟𝑒|𝐼𝑛𝑐𝑜𝑚𝑒𝑖 ) = 𝛽0 + 𝛽1 𝐼𝑛𝑐𝑜𝑚𝑒𝑖 + 𝛽2 𝐼𝑛𝑐𝑜𝑚𝑒𝑖2 , is a quadratic function of the
independent variable, Income.
In fact, this equation is a version of the multiple regression model with two regressors:
- The first regressor is Income
- The second regressor is Income2
Mechanically, you can create this second regressor by generating a new variable that
equals the square of Income, for example as an additional column in a spreadsheet. Thus,
after defining the regressors as Income and Income2, the nonlinear model in the equation is
simply a multiple regression model with two regressors!
Because the quadratic regression model is a variant of multiple regression, its unknown population
coefficients can be estimated and tested using the OLS methods.
Example: Estimating the coefficients using OLS for the 420 observations in Figure 8.2 yields:
The quadratic function captures the curvature in the scatterplot:
- It is steep for low values of district income
- but flattens out when district income is high
We can test the null hypothesis that the population regression function is linear against the
alternative that it is quadratic by testing the null hypothesis that 𝛽2 = 0 against the alternative that
̂ −0
𝛽2
𝛽2 ≠ 0. (by using the t-statistic 𝑇 = 𝑆𝐸(𝛽̂ ))
2
,The Effect on Y of a Change in X in Nonlinear Specifications
You want to know how the dependent variable Y is expected to change when the independent
variable X1 changes by the amount ΔX1, holding constant other independent variables X2, … , Xk.
- A general formula for a nonlinear population regression function:
𝑌𝑖 = 𝑓(𝑋1𝑖 , 𝑋2𝑖 , … , 𝑋𝑘𝑖 ) + 𝑢𝑖 , 𝑖 = 1, … , 𝑛
where f(X1i, X2i, … , Xki) is the population nonlinear regression function.
The expected change in Y, ΔY, associated with the change in X1, ΔX1, holding X2, … , Xk constant, is the
difference between the value of the population regression function before and after changing X1,
holding X2, … , Xk constant. That is, the expected change in Y is the difference:
∆𝑌 = 𝑓 (𝑋1 + ∆𝑋1 , 𝑋2 , … , 𝑋𝑘 ) − 𝑓(𝑋1 , 𝑋2 , … , 𝑋𝑘 )
The estimator of this unknown population difference is the difference between the predicted values
for these two cases. Let 𝑓̂(𝑋1 , 𝑋2 , … , 𝑋𝑘 ) be the predicted value of Y based on the estimator 𝑓̂ of the
population regression function. Then the predicted change in Y is
∆𝑌̂ = 𝑓̂(𝑋1 + ∆𝑋1 , 𝑋2 , … , 𝑋𝑘 ) − 𝑓̂(𝑋1 , 𝑋2 , … , 𝑋𝑘 )
Standard errors: Example in book, where income changes from 10 to 11
𝑆𝐸(∆𝑌̂ ) = 𝑆𝐸((11 − 10)𝛽̂1 + (112 − 102 )𝛽̂2 ) = 𝑆𝐸(𝛽̂1 + 21𝛽̂2 )
|∆𝑌 |̂
General formula: 𝑆𝐸(∆𝑌̂ ) =
√𝐹
A General Approach to Modelling Nonlinearities Using Multiple Regression
1. Identify a possible nonlinear relationship
ask yourself whether the slope of the regression function relating Y and X might
reasonably depend on the value of X or on another independent variable.
2. Specify a nonlinear function and estimate its parameters by OLS
3. Determine whether the nonlinear model improves upon a linear model
just because you think a regression function is nonlinear does not mean it really is!
must determine empirically whether your nonlinear model is appropriate
4. Plot the estimated nonlinear regression function
does the estimated regression function describe the data well?
5. Estimate the effect on Y of a change in X
use the estimated regression to calculate the effect on Y of a change in one or more
regressors X
, Non-linear functions of a single independent variable
Polynomials
One way to specify a nonlinear regression function is to use a polynomial in X. In general, let r
denote the highest power of X that is included in the regression. The polynomial regression model
of degree r is:
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝛽2 𝑋𝑖2 + ⋯ + 𝛽𝑟 𝑋𝑖𝑟 + 𝑢𝑖
When r = 2, the equation is the quadratic regression model. When r = 3 so that the highest power of
X included is X3, the equation is called the cubic regression model.
The unknown coefficients 𝛽0 , 𝛽1 , … , 𝛽𝑟 can be estimated by OLS regression of Yi against
𝑋𝑖 , 𝑋𝑖2 , … , 𝑋𝑖𝑟 .
- Testing the null hypothesis that the population regression function is linear
If the population regression function is linear, then the quadratic and higher-degree terms
do not enter the population regression function. Accordingly, the null hypothesis that the
regression is linear and the alternative that it is a polynomial of degree r correspond to:
𝐻0 : 𝛽2 = 0, 𝛽3 = 0, … , 𝛽𝑟 = 0 𝑣𝑠. 𝐻1 : 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝛽𝑗 ≠ 0 , 𝑗 = 2, … , 𝑟
- How many powers of X should be included in a polynomial regression? (Which degree
polynomial)
include enough to model the nonlinear regression function adequately, but no more
1. Pick a maximum value of r and estimate the polynomial regression for that r.
2. Use the t-statistic to test the hypothesis that the coefficient on 𝑋 𝑟 (𝛽𝑟 in the equation) is
zero. If you reject this hypothesis, then 𝑋 𝑟 belongs in the regression, so use the
polynomial of degree r.
3. If you do not reject 𝛽𝑟 = 0 in step 2, eliminate 𝑋 𝑟 from the regression and estimate a
polynomial regression of degree r - 1. Test whether the coefficient on 𝑋 𝑟−1 is zero. If you
reject, use the polynomial of degree r - 1.
4. If you do not reject 𝛽𝑟−1 = 0 in step 3, continue this procedure until the coefficient on
the highest power in your polynomial is statistically significant.
Logarithms
Another way to specify a nonlinear regression function is to use the natural logarithm of Y and/or X.
Logarithms convert changes in variables into percentage changes, and many relationships are
naturally expressed in terms of percentages.
The exponential function and the natural logarithm (its inverse) play an important role in modelling
nonlinear regression functions.
The exponential function of x is 𝑒 𝑥 , or 𝑒𝑥𝑝(𝑥), where e equals to the constant 2.71828…
The natural logarithm (ln) is the inverse of the exponential function; that is, the natural logarithm is
the function for which 𝑥 = 𝑙𝑛(𝑒 𝑥 ) or, equivalently, 𝑥 = 𝑙𝑛[𝑒𝑥𝑝(𝑥)]. The base of the natural
logarithm is e.