UNIVERSITY COLLEGE LONDON
DEPARTMENT OF ECONOMICS
Economics BSc (Econ)
First Year – Term 2
APPLIED ECONOMICS
ECON0004
Rodrigo Antón García
rodrigo.garcia.20@ucl.ac.uk
London, 2021
,ECON0004 – Applied Economics Rodrigo Antón García
Contents
Week 1: Introduction to data, economic modelling and econometrics. 1
Week 2: The regression model and OLS estimator. 5
Week 3: Properties of OLS estimator and hypothesis testing. 15
Week 4: Multiple regression and functional form issues. 24
Week 5: Causality, experimental and quasi-experimental evidence. 38
Week 6: Consumer demand. 45
Week 7: Consumption and saving. 54
Week 8: Employment and Minimum Wages. 63
Week 9: Labour supply. 68
Week 10: Review. 75
,ECON0004 – Applied Economics Rodrigo Antón García
ECON0004: APPLIED ECONOMICS
Week 1: Introduction to data, economic modelling and econometrics.
Datasets are collections of realisations of random variables. They usually include several
variables X, Y, Z, etc. There is a value for these variables for each of the N observations
in the dataset. Each observation is a realization of the random variable: (X1, X2, ..., XN),
(Y1, Y2, ..., YN), (Z1, Z2, ..., ZN), etc.
Data differ by their unit of observation (or level): individual person, household, firms, etc.
or for example aggregated at geographical areas, e.g., countries.
• Main types of data used in applied economics.
- Time series data.
A time series data set consists of observations on a variable or several variables over
time. The same unit is observed at different points in time. The data frequency is given
daily, weekly, monthly, quarterly, annually, etc. These data are good for investigating
effects of variables which vary mainly over time.
Example: UK Economic Growth between 1979 and 2012; UK daily new confirmed
COVID-19 cases from 1 March 2020 to 7 Jan 2021.
- Cross sectional data.
A cross-sectional data set consists of a sample of individuals, households, firms, cities,
states, countries, or a variety of other units of interest, taken at a given point in time.
These data are good for investigating relationships between variables which vary across
in units at a given point in time (incomes, commodity demands).
An important feature of cross-sectional data
is that we can often assume that they have
been obtained by random sampling from
the underlying population.
Example: productivity and output of firms in
the UK at one point in time, wages and
education of workers, etc.
1
,ECON0004 – Applied Economics Rodrigo Antón García
- Combination of cross-sectional and time series data:
- Panel or longitudinal data.
A panel data (or longitudinal data) set consists of a time series for each cross-sectional
member in the data set. Here, the same cross-sectional units are followed over time.
Therefore, panel data have a cross-sectional and a time series dimension. These data
are good for investigating life-cycle phenomena.
Example: longitudinal surveys that follow the same cohort of individuals from young
through old age.
- Pooled cross sections.
A pooled cross section data set combines two or more cross sections in one data set. It
therefore has both cross-sectional and time series features. Here, cross sections are
drawn independently of each other. These data are good for effectively analyzing the
effects of a new government policy by collecting data from the years before and after a
key policy change.
Example: evaluating the effect of change in property taxes on house prices.
- Random sample of house prices for the year 1993.
- A new random sample of house prices for the year 1995.
- Compare before and after (1993: before reform, 1995: after reform).
• Econometrics. The use of statistical methods to analyze economic data.
The application of statistical and mathematical methods to the analysis of economic data,
with a purpose of giving empirical content to economic theories and verifying them or
refuting them. Econometricians typically analyze nonexperimental data.
Typical goals of econometric analysis include:
- Estimating relationships between economic variables.
- Testing economic theories and hypotheses.
- Forecasting economic variables.
- Evaluating and implementing government and business policy.
• Economic Theoretical Model. Example.
Aggregate production function:
! = #$ ! (ℎ()"#!
- ! is output.
- $ is physical capital.
- ℎ is the amount of labour per worker and is related to schooling.
- ( is the number of workers, and total labour input is ℎ(.
- # is a measure of productivity.
- * is capital’s share of income. 0 < * < 1
2
,ECON0004 – Applied Economics Rodrigo Antón García
- Per-worker production function:
. = #/ ! ℎ"#!
012312 = 34051627872. × :;6204< 0: 340516270=
$ &
Where = ., = / and / ! ℎ"#! is a composite of the two factors of production.
% %
- Estimating the income ratio:
." #" /"! ℎ""#!
=
.' #' /'! ℎ"#!
'
4;270 0: 012312 = 4;270 0: 34051627872. × 4;270 0: :;6204< 0: 340516270=
- .: GDP per worker. GDP is a measure of the value of all of the goods and services
produced in a country in a year. It is also known as output or national income.
- /: physical capital per worker.
- ℎ: number of years of schooling per worker.
Transforming . = #/ ! ℎ"#! into an equation relating growth rates gives (done by taking
logarithms and taking the derivative with respective to time):
.> = #? + */A + (1 − *)ℎA
C40D2ℎ 4;2E 0: 012312 = C40D2ℎ 4;2E 0: 34051627872.
+ C40D2ℎ 4;2E 0: :;6204< 0: 340516270=
where (^) means the growth rate of that variable.
Since output and factors of production are measurable, alternatively, we have
#? = .> − */A − (1 − *)ℎA
This is called growth accounting. ^A is called Solow residual.
3
,ECON0004 – Applied Economics Rodrigo Antón García
• Correlation and Causality.
Correlation describes the degree to which two variables tend to move together.
For example: education and earnings; age and earnings; health and income, etc.
Sample correlation between x and y:
Correlation is only a statement of numerical facts; it says nothing about cause and effects.
Be careful! Correlation is not the same as causality.
Causality describes the cause and effect; other things equal (ceteris paribus), one thing
will tend to result in another thing.
Most economic questions are ceteris paribus questions. Therefore, it is important to
define which causal effect one is interested in.
- Consider a positive correlation between X and Y.
- X causes Y. X affects Y, causation is running from X to Y. E.g., rain and umbrella
sales.
- Causation can go either way (reverse causality). E.g., health and income.
- There is no direct causal relationship between X and Y. But some third variable,
Z, causes both X and Y. In this case, Z is called an omitted variable. E.g., ice-cream
consumption and shark attacks. The omitted variable here is hot weather.
4
,ECON0004 – Applied Economics Rodrigo Antón García
Week 2: The regression model and OLS estimator.
• Introduction to the regression model. Example: the wage equation.
Economic theory suggests that more productive workers should earn higher wages.
Among other attributes a worker possesses skills, these can be acquired through
education. Better education therefore raises productivity and is a signal to employers
about workers’ productivity. The conclusion is that more educated individual should earn
higher wages. But does education really raise wages? And if so, by how much?
We can investigate this issue with a sample of N observations of the wages and
schooling level of working men in the UK.
- ln D( : log hourly wage for the 7 )* individual.
- E51( : number of years of education of the 7 )* individual.
Suppose there is a linear equation relating ln wage to years of education:
ln D( = * + I E51( (1)
- * = the log hourly wage of someone with no education.
- I = the effect of an extra year of education on log hourly wage.
The main objective of regression analysis is to obtain numerical estimates of the
coefficients * and I by using data, in this case, on log wages on schooling.
- Since the data we have (See Lecture 2 – Part I pg. 4) shows a positive correlation,
we can make the sensible assumption that I > 0.
- The intercept in the y-axis gives the log wage for a worker with zero years of
schooling.
- The slope gives the change in the log wage associated with a one-year change
in years of schooling.
∆ ln D( 6ℎ;=CE 7= L0C D;CE
I= = (2)
∆E51( 6ℎ;=CE 7= .E;4< 0: <6ℎ00L7=C
- Why do we model L= D( instead of D( ? See live lecture 2.
We typically model L= D( , instead of D( because it is a mathematical fact that a small
change in the log wage approximates the percent change in the wage. By using ln wage
instead of wage, we can then interpret changes in this quantity as a percent change in
the wage. So I can be interpreted as giving the percentage change in earnings resulting
from a one-year increase in schooling.
Remember:
- y: dependent / explained / response / predicted variable, regressand.
- x: independent / explanatory / control / predictor variable, regressor.
5
, ECON0004 – Applied Economics Rodrigo Antón García
- Analyzing the regression model with data. Scatter graph.
If we plot the data on a scatter diagram, we can see that the relation between log wage
and education (years of schooling) does not look like the regression line we hypothesized
in equation (1).
It can be observed that given the same level
of years of schooling (i.e., 10) we observe a
range of values of log wage (i.e., 0.9 – 3
approx.).
However, the points are not randomly
scattered on the page, they do have an
upward-sloping drift.
Overall, this relationship is not as simple or exact as the deterministic linear relationship
we hypothesized. We have to recognize that education is not the only factor determines
individual wages, there is a discrepancy between the model and the data (a range of
values for the log wage).
- The disturbance (or error) term.
To reflect the discrepancy between the model and the data, we introduce a disturbance
or error term O( also seen as 1,
ln D( = * + I E51( + O( (3)
With the disturbance term, the relationship between wages and education is no longer
deterministic. It is stochastic.
• The general statistical model.
Re-writing the model with the disturbance term now we get the simple linear regression
model. It is also called the two-variable or bivariate linear regression model.
Q+ = R + S T+ + U+ / Q+ = S, + S" T+ + W+ (4)
- .( = dependent variable (explained variable, regressand, LHS variable).
- Y( = independent variable (explanatory variable, regressor, RHS variable).
* and I are the unknown parameters which we seek to recover.
- * = constant or intercept.
- I = coefficient of Y( .
The existence of O( and the fact that its magnitude is unknown makes calculation of the
parameter impossible. Therefore, they must be estimated.
6