About correlation
Correlation is a standardizes measure of the strength of a linear relationship between
two variables, multiple strengths of relationships can be compared because of that.
A correlation of 0 means that when one variable increases, that has no linear influence
of the other variable. But it does not mean that there is no relation between the two
variables. A relationship can also be non-linear.
Correlation does not say anything about the causal effects of the variables.
You can only calculate Pearson’s r correlation for variables on interval or ratio level.
Linear regression equation
Linear regressions are used to make predictions about linear relations
Slope: if x increases by 1, how much does y increase (Y:X)
Intercept: point where the regression line crosses the y-axis
Y-value= intercept + slope x X-value or Ŷ= b0 + b1X
The hat on Y means that it is the predicted y-score and not the observed y-score.
Least squares method
Error/residual: Distance between the true value Y and the predicted Ŷ (Y- Ŷ)
You want to draw a line in such a way that you minimize the errors (smallest possible
sum of squared errors).
The formula that determines the slope of the line with the smallest sum of squared
errors contains the following ingredient: correlation coefficient, standard deviation of
y and standard deviation of x.
R-squared
R-squared: Goodness of fit number of a linear regression.
Determines the proportion of the variance of the response variable that is ‘explained’
by the predictor variable(s). The R-squared is a proportion between 0 and 1.
If R-squared is very small, this does not mean that there is no meaningful relationship
between the two variables. It can be significant without explaining much variation.
If R-squared is very large, this does not mean that the model is useful for predicting
new observations. A very large R-squared could be due to the specific sample and
might not predict well in a different sample.
Multiple linear regression
Assumptions
1. Measurement levels. Dependent variable is a continuous measure (interval or ratio
level). Independent variables are continuous or dichotomous (nominal with 2
categories)
2. Linearity. There is a linear relationship between the dependent variable and all
continuous independent variables. Make a scatterplot:
SPSS: Graphs Legacy Dialogs Scatter/Dot
3. Absence of outliers. This can be assessed through examining the scatter plots.
, Conducting a multiple linear regression
SPSS: Analysis Regression linear
Consider the dependent and independent variable, put them in the right boxes.
Checking various assumptions, by ticking the following boxes:
o Absence of outliers. Click on save and check: Standardised residuals,
Mahalanobis Distance and Cook’s Distance
o Absence of Multicollinearity. Click on statistics and check: Collinearity
Diagnostics
o Homoscedasticity. Click on Plots, place the variable *ZPRED (standardized
predicted values) on the X-axis, place the variable *ZRESID (standardized
residuals) on the Y-axis.
o Normally Distributed Residuals. Click on plots and check Histogram
Checking assumptions
Absence of outliers. Determine through a scatterplot or boxplot, whether there are
outliers within the data. Or look at the Residual Statistics Table and view the
Minimum and Maximum Values of the standardized residuals, the Mahalanobis
distance and the Cook’s Distance. On basis of these values, it is possible to assess
whether there are outliers in the Y-space, X-space and XY-space.
o Standardized residuals (outliers on the Y-space). Values must be between -3.3
and +3.3, otherwise it can be seen as an outlier.
o Mahalanobis Distance (outliers on the X-space). Values must be lower than 10
+ 2 x (number of IV). Values higher than this critical value indicate outliers.
o Cook’s distance (outliers on the XY-space). Indicates the overall influence of a
respondent on the model. Value must be lower than 1. Values higher than 1
indicate influential respondents (influential cases).
o Removing an outlier? Don’t include the participant if the value is theoretically
not possible or the participant does not belong to the group you want to make
inferences about. When this is not the case, run the analysis with and without
the participant, report the results of both analyses and discuss any differences.
Absence of multicollinearity. Multicollinearity refers to a situation in which more
than two explanatory variables in a multiple regression model are highly correlated (r
> .8). If you include overly related variables in your model this has three
consequences:
o The regression coefficients (B) are unreliable.
o It limits the magnitude of R (the correlation between Y and Ŷ
o The importance of individual independent variables can hardly be determined,
if at all.
Information on multicollinearity can be found in the last column of the Coefficient
table.
o Values for the Tolerance smaller than .2 indicate a potential problem
o Values for the Tolerance smaller than .1 indicate a problem.
o The variance inflation factor (VIF) is equal to 1/Tolerance. So for the VIF ,
values greater than 10 indicate a problem
Homoscedasticity. Homoscedasticity means that the spread of the residuals for an X
value must be approximately the same across all points. You can assess this by
plotting the standardized residuals against the standardized predicted values. If for
every predicted value (X-axis) there is approximately the same amount of spread
around the Y-axis, then the condition is met.
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper studentutrecht. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €4,49. Je zit daarna nergens aan vast.