, Chapter 1: An Overview of Regression Analysis
1.1 What is econometrics?
Econometrics – literally, ‘economic measurement’ – is the quantitative measurement and analysis of
actual economic and business phenomena. It attempts to quantify economic reality and bridge the
gap between the abstract world of economic theory and the real world of human activity.
Uses of Econometrics
1. Describing economic reality
2. Testing hypotheses about economic theory and policy
3. Forecasting future economic activity.
The simplest use of econometrics is description. We can use econometrics to quantify economic
activity and measure marginal effects because econometrics allows us to estimate number and put
them in equations that previously contained only abstract symbols. Example: Consumer demand for
a particular product often can be thought of as a relationship between the quantity demanded (Q)
and the product’s price (P), the price of a substitute (Ps), and disposable income (Yd). econometrics
actually allows us to estimate that relationship based upon past consumption, income and prices. In
other words, a general and purely theoretical functional relationship like:
Q= ß 0+ ß 1 P+ ß 2 Ps+ ß 1Yd (1.1)
Can become explicit:
Q=27.7−0.11 P+0.03 Ps+ 0.23Yd (1.2)
Instead of expecting consumption merely to “increase” if there is an increase in disposable income,
Equation 1.2 allows us to expect an increase of a specific amount (0.23 units for each unit of
increased disposable income). The number 0.23 is called an estimated regression coefficient, and it
is the ability to estimate these coefficients that makes econometrics valuable.
The second use of econometrics is hypothesis testing, the evaluation of alternative theories
with quantitative evidence. Example, you could test the hypotheses that the product in equation 1.1
is what economists call a normal good. You could do this by applying various statistical tests to the
estimated coefficient (0.23( of disposable income (Yd) in equation 1.2.
The third and most difficult use of econometrics is to forecast or predict what is likely to
happen next quarter, next year based on what has happened in the past. The accuracy of forecasts
depend in large measure on the degree to which the past is a good guide to the future.
Alternative Econometric Approaches
There are many different approaches to quantitative work. Different approaches also make sense
with the field of economics. A model built solely for descriptive purposes might be different from a
forecasting model, for example.
To get a better picture of these approaches, let’s look at the steps used in nonexperimental
quantitative research:
1. Specifying the models or relationships to be studied.
2. Collecting the data needed to quantify the models.
3. Quantifying the models with the data.
The specifications used in step 1 and the techniques used in step 3 differ widely between and within
disciplines. Choosing the best specification for a given model is a theory-based skill that is often
referred to as the ‘art’ of econometrics. The choice of approach is left to the individual
econometrician, but each researcher should be able to justify that choice.
1
,1.2 What Is Regression Analysis
Econometricians use regression analysis to make quantitative estimates of economic relationships
that previously have been completely theoretical in nature. To predict the direction of the change,
you need a knowledge of economic theory and the general characteristics of the product in question.
To predict the amount of the change, though, you need a sample of data, and you need a way to
estimate the relationship.
Dependent Variables, Independent Variables, and Causality
Regression analysis is a statistical technique that attempts to “explain” movements in one variable,
the dependent variable, as a function of movements in a set of other variables, called the
independent (or explanatory) variables, through the quantification of one or more equations.
Q= ß 0+ ß 1 P+ ß 2 Ps+ ß 1Yd (1.1)
Q is the dependent variable and P, Ps and Yd are the independent variables. Propositions that pose
an if-then, or causal, relationship that logically postulates that a dependent variable’s movements are
determined by movements in a number of specific independent variables.
Regression analysis can’t confirm causality; it can only test the strength and direction of the
quantitative relationships involved.
Single-Equation Linear Models
The simplest single-equation regression model is:
Y = ß 0+ ß 1 X (1.3)
The equation states that Y, the dependent variable, is a single-equation linear function of X, the
independent variable. The model is single-equation model because it’s the only equation specified.
The model is linear because if you were to plot the equation it would be a straight line rather than a
curve.
The ßs are the coefficients that determine the coordinates of the straight line at any point. ß0
is the constant of intercept term; it indicated the value of Y when X equals zero. ß1 is the slope
coefficient, and it indicated the amount that Y will change when X increases by one unit.
For a linear model, the slope is constant over the entire function. An equation is linear if plotting the
function in terms of X and Y generates a straight line.
The Stochastic Error Term
Besides the variation in the dependent variable (Y) that is caused by the independent variable (X),
there is almost always variation that comes from other sources as well. This additional variation
comes in part from omitted explanatory variables . Even if these extra variables are added to the
equation, there still is going to be some variation in Y that simply can’t be explained by the model.
This variation probably comes from sources such as omitted influences, measurement error,
incorrect functional form, or purely random and totally unpredictable occurrences. By random we
mean something that has its value determined entirely by chance.
A stochastic error term is a term that is added to a regression equation to introduce all of the
variation in Y that can’t be explained by the included Xs. The error term (sometimes called
disturbance term) usually is referred to with the symbol epsilon ( ε), although other symbols (like u or
v) sometimes are used. The typical regression equation:
Y = ß 0+ ß 1 X + ¿ε (1.4)
Equation 1.4 can be thought of as having two components, the deterministic component and the
stochastic, or random, component. The expression ß0 + ß1X is called the deterministic component of
the regression equation because it indicated the value of Y that is determined by a given value of X,
which is assumed to be nonstochastic. This deterministic component can also be thought of as the
expected value of Y given X, the mean value of the Ys associated with a particular value of X.
2
, For example, if the average height of all 13-year-old girls is 5 feet, then 5 feet is the expected value of
a girl’s height given that she is 13. The deterministic part of the equation may be written:
E ( Y |X )=ß 0+ ß 1 X (1.5)
Which states that the expected value of Y given X, denoted as E(Y|X), is a linear function of the
independent variable.
The value of Y observed in the real world is unlikely to be exactly equal to the deterministic
expected value E(Y|X). As a result, the stochastic element (ε) must be added.
Y =E (Y | X ) +ε =ß 0+ ß 1 X +ε (1.6)
The stochastic error term must be present in a regression equation because there are at least four
sources of variation in Y other than the variation in the included Xs:
1. Many minor influences on Y are omitted from the equation (for example, because data are
unavailable).
2. It is virtually impossible to avoid some sort of measurement error in the dependent variable.
3. The underlying theoretical equation might have a different functional form (or shape) than
the one chosen for the regression. For example, the underlying equation might be nonlinear.
4. All attempts to generalize human behavior must contain at least some amount of
unpredictable or purely random variation.
To get a better feeling for these components of the stochastic error term, let’s think about a
consumption function.
First, consumption in a particular year may have been less than it would have been because
of uncertainty over the future course of the economy.
Second, the observed amount of consumption may have been different from the actual level
of consumption in a particular year due to an error in the measurement of consumption in the
National Income Accounts.
Third, the underlying consumption function may be nonlinear, but a linear consumption
function might be estimated.
Fourth, the consumption function attempts to portray the behavior of people, and there is
always an element of unpredictability in human behavior.
These possibilities explain the existence of a difference between the observed values of Y and
the values expected form the deterministic components of the equation, E(Y|X).
Extending the Notation
Our regression notation needs to be extended to allow the possibility of more than one independent
variable and to include reference to the number of observation. If we include a specific reference to
the observations, the single-equation linear regression model may be written as:
Yi=ß 0+ ß 1 Xi+ ε i (i = 1,2,…,N) (1.7)
where: Yi = the ith observation of the dependent variable
Xi = the ith observation of the independent variable
ε i = the ith observation of the stochastic error term
ß0, ß1 = the regression coefficients
N = the number of observations
Yn= ß 0+ ß 1 Xn+ εn
That is, the regression model is assumed to hold for each observation. The coefficients do not change
from observation to observation, but the values of Y, X, and ε do.
A second notational addition allows for more than one independent variable. Our notation should
allow the additional explanatory Xs to be added. If we define:
X1i = the ith observation of the first independent variable
3
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through EFT, credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying this summary from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller anoukreumkens. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy this summary for R136,55. You're not tied to anything after your purchase.