Kim van Leussen (6745555) & Yara Langeveld (6733506)
Statistics (GEO2-2217) - Assignment 3
Group number 99
Assignment 3
1. Assumptions linear models
In the first assignment, we have created an overall success variable which we used in the
first and second assignment. Now we will have a look at the different dimensions of
success. We will continue with a focus on technological success (variable Techsuccess)
and the ecological success (variable Ecolsuccess). We are going to use two multiple linear
regression analyses to see how different variables affect each of these success dimensions
of projects.
Your dataset contains variables that deal with the characteristics of the project itself, as
well as a number of variables that deal with the project partners and the interaction
among them. In this question we are going to look at the following project characteristics:
- Variable PrevR (Was there a pre-research (of max 1 year)?)
- Variable Duration (Number of months the project lasted)
- Variable Size (The total number of participating organizations. We created this
variable in assignment 21)
And at the following variable that relate to the project partners and their interaction:
- Variable Distance (Geographical distance between partners in KM)
Based on literature, researchers expect that PrevR and Size have a positive relationship
with the two dimensions of success. For Duration and Distance they expect a negative
relationship with the two dimensions of success.
Perform the two multiple linear regression analyses to test this. In addition to the
regression coefficients and confidence intervals, also obtain the collinearity diagnosis and
a residual plot with ZRESID on the Y-axis and ZPRED on the X-axis. Also determine the
Cook distances.
Figure 1. Multiple Linear Regression model with dependent variable Techsuccess
, Figure 2. Multiple Linear Regression model with dependent variable Ecolsuccess
a. First, explain if the two regression models are statistically significant. (Use α = 5%)
Please provide the appropriate table(s).
The assumption is that there is a directed hypothesis, because the introduction states that the
variables PrevR and Size are expected to have a positive relationship with the two dimensions of the
variable Success and the variables Duration and Distance are expected to have a negative
relationship with the two dimensions of the variable Success. To investigate if this is true, two multiple
linear regression models are constructed. The first one with Techsuccess as dependent variable and
PrevR, Size, Duration and Distance as independent variables, Figure 1. The second model consists of
the variable Ecolsuccess as dependent variable and PrevR, Size, Duration and Distance as
independent variables, Figure 2.
In the first model, the two-tailed p-value equals p = 0.037, as can be seen in the row ‘Regression’ of
the ANOVA table in Figure 1. Due to a directed hypothesis, this value has to be divided by two, which
1
is equal to 2
𝑝 = 0. 0185. With a significance level of α = 5%, the p-value of the first model is
significant, because 1.9% is less than 5%.
In the second model, the two-tailed p-value equals p = 0.017, as can be seen in the row ‘Regression’
of the ANOVA table in Figure 2. Just like with the first model, this has to be divided by two, resulting in
p = 0.0085. As 0.9% is less than the significance level of α = 5%, the second model is also
significant.
b. Second, discuss for each variable 1) if the effects have the expected direction and 2)
if they are statistically significant based on the confidence interval (use the SPSS
output for this). Do this for both regression models. Use α= 5%.
If the unstandardized coefficient is negative, it means that there is a negative relationship with the
dependent variable and if it is positive, it indicates a positive relationship with the dependent variable.
This will be used to determine if the expected direction is applicable. For the variables Size and
PrevR, this direction is expected to be positive and for the variables Distance and Duration, it is
expected to be negative. These expected directions are tested below.
Moreover, the confidence interval will be used to say something about the statistical significance of
the different independent variables. If the coefficient β = 0, namely, the model states that the
independent variable and dependent variable are not correlated with each other. In other words, there