Assignment 2: Linear Regression
1. Suppose you want to analyze how the production volume affects energy
consumption in a plant. You record daily data for one work week and
observe the following.
Obs. Volume Consumption
1 5 34
2 8 66
3 4 45
4 7 67
5 6 63
You perform a simple linear regression and your estimate for the in-
tercept is 10 and your estimate for the slope is 7.5. (Note: Estimating
a statistical model on such a small data set typically makes no sense.
However, here it allows us to perform the calculations by hand and
makes the concepts palpable.)
(a) Draw a picture including the observations and the regression line
similar to Figure 3.1. Add residuals and variation. (Hint: You
can draw the regression line and the observations with Excel or
Python and then add residuals and variation by hand.)
(b) Calculate the residual sum of squares (RSS), the total sum of
squares (TSS), R2 , and the residual standard error (RSE). Briefly
interpret each measure.
(c) Suppose you would duplicate your data set such that your new
data set contains each observation twice.
i. What would happen to your coefficient estimates?
ii. What would happen to RSS, TSS,R2 , and RSE?
(Hint: No calculations needed.)
2. You perform a multiple linear regression to predict the profit associ-
ated with different products depending on a product’s quality (X1 ),
a dummy variable indicating if a product is on sale with no sale as
baseline (X2 = 1, if it is on sale and X2 = 0 otherwise ), and the
interaction between quality and sale (X3 = X1 · X2 ). The estimated
coefficients are βˆ0 = 10, βˆ1 = 5, βˆ2 = 3, and βˆ3 = −7.
1
1. Suppose you want to analyze how the production volume affects energy
consumption in a plant. You record daily data for one work week and
observe the following.
Obs. Volume Consumption
1 5 34
2 8 66
3 4 45
4 7 67
5 6 63
You perform a simple linear regression and your estimate for the in-
tercept is 10 and your estimate for the slope is 7.5. (Note: Estimating
a statistical model on such a small data set typically makes no sense.
However, here it allows us to perform the calculations by hand and
makes the concepts palpable.)
(a) Draw a picture including the observations and the regression line
similar to Figure 3.1. Add residuals and variation. (Hint: You
can draw the regression line and the observations with Excel or
Python and then add residuals and variation by hand.)
(b) Calculate the residual sum of squares (RSS), the total sum of
squares (TSS), R2 , and the residual standard error (RSE). Briefly
interpret each measure.
(c) Suppose you would duplicate your data set such that your new
data set contains each observation twice.
i. What would happen to your coefficient estimates?
ii. What would happen to RSS, TSS,R2 , and RSE?
(Hint: No calculations needed.)
2. You perform a multiple linear regression to predict the profit associ-
ated with different products depending on a product’s quality (X1 ),
a dummy variable indicating if a product is on sale with no sale as
baseline (X2 = 1, if it is on sale and X2 = 0 otherwise ), and the
interaction between quality and sale (X3 = X1 · X2 ). The estimated
coefficients are βˆ0 = 10, βˆ1 = 5, βˆ2 = 3, and βˆ3 = −7.
1