Empirical Exercises
Hoofdstuk 4:
1. Data file Growth: Contains data on average growth rates from 1960 through 1995 for 65 countries, along
with variables that are potentially related to growth. In this exercise, you will investigate the relationship
between growth and trade.
a. Construct a scatterplot of average annual growth rate (Growth) on the average trade share (TradeShare).
Does there appear to be a relationship between the variables?
summ
twoway (scatter growth tradeshare)
Yes, there appears to be a weak positive relationship.
b. One country, Malte, has a trade share much larger than the other countries. Find Malte on the
scatterplot. Does Malte look like an outlier?
Malta is the “outlying” observation with a trade share of 2.
c. Using all observations, run a regression of Growth on tradeShare. What is the estimated slope? What is
the estimated intercept? Use the regression to predict the growth rate for a country with a trade share
of 0.5 and for another with a trade share equal to 1.0.
reg growth tradeshare, r
Predicted growth (Trade Share = 1) = 0.64 + 2.31 1 = 2.95
Predicted growth (Trade Share = 0.5) = 0.64 + 2.31 0.50 = 1.80
d. Estimate the same regression, exclusing the data from Malta. Answer the same question in (c).
drop if tradeshare > 1.5
reg growth tradeshare if (tradeshare < 1.5), r
Predicted growth (Trade Share = 1) = 0.96 + 1.68 1 = 2.64
Predicted growth (Trade Share = 0.5) = 0.96 + 1.68 0.50 = 1.80
e. Plot the estimated regression functions from (c) and (d). Using the scatterplot in (a), explain why the
regression function that includes Malta is steeper than the regression function that excludes Malta.
graph twoway (lfit growth tradeshare) (scatter growth tradeshare)
graph twoway (lfit growth tradeshare if (tradeshare < 1.5)) (lfit growth tradeshare) (scatter growth
tradeshare)
f. Where is Malta? Why is the Malta trade share so large? Should Malta be included or excluded from the
analysis?
Malta is an island nation in the Mediterranean Sea, south of Sicily. Malta is a freight transport site,
which explains its large “trade share.” Many goods coming into Malta (imports into Malta) and are
immediately transported to other countries (as exports from Malta). Thus, Malta’s imports and
exports are unlike the imports and exports of most other countries. Malta should not be included in
the analysis.
,2. Date file Earnings_and_Height: Contains data on earnings, height, and other characteristics of a random
sample of U.S. workers. In this exercise, you will investigate the relationship between earnings and height.
a. What is the median value of height in the sample?
summarize height, detail
The median height in the sample is 67 inches
b.
i. Estimate average earnings for workers whose height is at most 67 inches.
ii. Estimate average earnings for workers whose height is greater than 67 inches.
iii. On average, do taller workers earn more than shorter workers? How much more? What is a 95%
confidence interval for the differnce in average earnings?
gen split = height > 67
ttest earnings, by(split) unequal unpaired
The estimated average
annual earnings for shorter workers is $44,488, is $49,988 for taller workers, for a difference of
$5,499. The 95% confidence interval is $4,706 to $6,293. The difference is large (more than
10% of average earnings), precisely estimated (a standard error of $404) and statistically
significantly different from zero.
c. Construct a scatterplot of annual earnings (Earnings) on height (Height). Notice that the points on the plot
fall along horizontal lines. (There are only 23 distinct values of Earnings). Why? (Hint: Carefully read the
detailed data description.)
graph twoway (lfit earnings height) (scatter earnings height)
The data documentation reports that individual earnings were reported in 23 brackets, and a single
average value is reported for earnings in the same bracket. Thus, the dataset contains 23 distinct
values of earnings.
,d. Run a regression of Earnings on Height.
reg earnings height, r
The estimated regression is
i. What is the estimated slope?
The estimated slope is 707.7 (Dollars per year).
ii. Use the estimated regression to predict earnings for a worker who is 67 inches tall, for a worker
who is 70 inches tall, and for a worker who is 65 inches tall.
The estimated earnings are
e. Suppose hight were
measured in centimeters instead of inches. Answer the following questions about the Earnings on Height
(in cm) regression.
Recall that 1 cm = 0.394 inches. The estimated regression in (d), with units shown, is
($) = 512.7($) + 707.7($/inch)×Height(inches),
R2 (unit free) = 0.011, and SER = 26777($).
Note that
707.7($/inch)×Height(inches) = 707.7($/inch)×(0.394inch/cm)×Height(cm)
= 278.8($/cm)×Height(cm)
i. What is the estimated slope of the regression?
ii. What is the estimated intercept?
iii. What is the R2?
R2 (unit free) = 0.011
iv. What is the standard error of the regression?
SER = 26777($)
f. Run a regression of Earnings on Height, using data for female workers only.
reg earnings height if sex==0, r
The regression for females is
A women who is one inch taller than average is predicted to have earnings that are $511.2 per year
higher than average.
g. Repeat (f) for male workers.
reg earnings height if sex==1, r
The regression for males is
A man who is one inch taller than average is predicted to have earnings that are $1306.9 per year
higher than average.
h. Do you think that height is uncorrelated with other factors that cause earnings? That is, do you think that
the regression error term, ui, had a conditional mean of 0 given Height (X i)? 5you will investigate this
more in the Earnings and Heights exercises in later chapters.)
Height may be correlated with other factors that cause earnings. For example, height may be
correlated with “strength,” and in some occupations, stronger workers may by more productive.
There are many other potential factors that may be correlated with height and cause earnings and
you will investigate of these in future exercises.
, Hoofdstuk 5:
1. Use the data set Earnings_and_Height to carry out the following exercises.
a. Run a regression of Earnings and Height.
reg earnings height, r
The estimated regression is
i.
Is the estimated slope statistically significant?
ii.Construct a 95% confidence interval for the slope coefficient.
The 95% confidential interval for the slope coefficient is 707.7 ± 1.96×50.4, or 608.9 ≤ 1 ≤
806.5. This interval does not include 1 = 0, so the estimated slope is significantly different than
0 at the 5% level. Alternatively, the t-statistic is 707.7/50.4 ≈ 14.0, which is greater in absolute
value than the 5% critical value of 1.96. And finally, the p-value for the t-statistic is p-value ≈
0.000, which is smaller than 0.05.
b. Repeat (a) for women.
reg earnings height if sex==0, r
For women the estimated regression is
The 95% confidential interval for the slope coefficient is 511.2 ± 1.96×97.6, or 319.9 ≤ 1,Female ≤
702.5. This interval does not include 1,Female = 0, so the estimated slope is significantly different
than 0 at the 5% level.
c. Repeat (a) for men.
reg earnings height if sex==1, r
For men the estimated regression is
The 95% confidential interval for the slope coefficient is 1306.9 ± 1.96×98.9, or 1113.1 ≤ 1,Male ≤
1500.6. This interval does not include 1,Male = 0, so the estimated slope is significantly different
than 0 at the 5% level.
d. Test the null ypothesis that the effect of height on earnings is the same for men and women. (Hint: See
exercise 5.15.)