Quantitative Methods weekly
assignments
Group 28: Mitchell Burink, Gies Tesselaar, Nora Geelen
Assignment Theme Block 1
We started with a linear regression analysis with OLS, to explain the valuation that
international knowledge migrants have of the region with their migration background and
personal characteristics. We made a scatterplot with the valuation of the region and their
birth year. The scatter plot is shown in the figure below.
Does the scatterplot suggest the existence of a relationship between these two
variables?
The scatterplot suggests that there is not (much) relationship between the two variables.
There is heteroscedasticity. Heteroscedasticity is an uneven distribution of the variables.
In the scatterplot you can see that the data points are spread out, meaning there is no
positive or negative correlation. The data points don’t change together at a constant rate.
Draw a linear line in the scatterplot. Does the relationship look linear? Do you
see outliers?
The relationship is indeed linear. There is a slight increase in the line (y = 15,65+0,01*x).
If you take a look at the linear line in the scatterplot you notice a lot of outliers. These so-
called outliers are just more observations per year. In 1980 for example you see a lot of
diversity. There is no general trend visual. The significance is questionable, because of
the large variation of data points. To explore this further, there needs to be a regression
analysis.
,Conduct the bivariate linear regression analysis for these two variables (V149: the
valuation of the region is the dependent variable) and describe the linear relationship.
The table ‘Variables Entered/Removed’ gives information about the independent variable.
In the table you can see under the method, it states ‘Enter’. This means that the
independent variable was added to the model. In the table ‘Model Summary’ you can find
a summary of the model, including the ‘R-square’ and ‘Adjusted square’. The ‘R square’
indicates how well the model fits the data. The ‘R square’ always lies between 0 and 1.
What can you say about the ‘model-fit’ on the basis of the ANOVA table and the
R-squared?
The higher ‘R square’, the better the model. The ‘Adjusted R square’ is always a bit
smaller than the ‘R square’. The value of the ‘R squared’ in our case is really low: 0.004
(0.4%), which means that the independent variable does not explain much in the
variation of the dependent variable.
, Which share of the variance in the dependent variable is explained by the
model?
The F-test is used to compare two variables, to determine which one has a better fit to
the model. In the table ‘ANOVA’ you can see that the F-test value is 1,586. This is
relatively low, which means that the model has no significant explanation of the variance
of the dependent variable.
The significance value in the coefficients table is 0.209. This is much higher than 0.05,
which means there is no significant change in the grade the respondents would give their
region due to their date of birth.
How do you interpret the regression coefficients?
The table ‘Coefficients’ gives information about the coefficient regarding the independent
variable. Under ‘B’, you can find the original regression coefficient, which is 0,12. If the
regression coefficient is zero, it means that the two variables are independent. The
regression coefficient of the independent variable isn’t significantly different from zero.
What can you say about the reliability of the coefficients?
The T-test is 1,259 and the significance level is 0,209. This is a sign of non-reliability
because the significant value is higher than 0,05 (5%). Because of that, we cannot be
sure that the regression coefficient of 0,12 would vary from 0.
Are these findings in line with your own expectations, why or why not?
Yes, because the data is not trustworthy. We already see the R-square 0,004. This is a
sign of non-reliability.
What are plausible explanations for these findings?
● There are not the same number of respondents for all years, which gives a one-
sided picture of the data.
● The dependent variable is not categorical, which means participants are not
divided into separate groups or regions.