ARMS Advanced Research Methods And Statistics (201900104)
All documents for this subject (25)
Seller
Follow
studentutrecht
Content preview
GRASPLE
Refreshing linear regression.
About correlation
Correlation is a standardizes measure of the strength of a linear relationship between
two variables, multiple strengths of relationships can be compared because of that.
A correlation of 0 means that when one variable increases, that has no linear influence
of the other variable. But it does not mean that there is no relation between the two
variables. A relationship can also be non-linear.
Correlation does not say anything about the causal effects of the variables.
You can only calculate Pearson’s r correlation for variables on interval or ratio level.
Linear regression equation
Linear regressions are used to make predictions about linear relations
Slope: if x increases by 1, how much does y increase (Y:X)
Intercept: point where the regression line crosses the y-axis
Y-value= intercept + slope x X-value or Ŷ= b0 + b1X
The hat on Y means that it is the predicted y-score and not the observed y-score.
Least squares method
Error/residual: Distance between the true value Y and the predicted Ŷ (Y- Ŷ)
You want to draw a line in such a way that you minimize the errors (smallest possible
sum of squared errors).
The formula that determines the slope of the line with the smallest sum of squared
errors contains the following ingredient: correlation coefficient, standard deviation of
y and standard deviation of x.
R-squared
R-squared: Goodness of fit number of a linear regression.
Determines the proportion of the variance of the response variable that is ‘explained’
by the predictor variable(s). The R-squared is a proportion between 0 and 1.
If R-squared is very small, this does not mean that there is no meaningful relationship
between the two variables. It can be significant without explaining much variation.
If R-squared is very large, this does not mean that the model is useful for predicting
new observations. A very large R-squared could be due to the specific sample and
might not predict well in a different sample.
Multiple linear regression
Assumptions
1. Measurement levels. Dependent variable is a continuous measure (interval or ratio
level). Independent variables are continuous or dichotomous (nominal with 2
categories)
2. Linearity. There is a linear relationship between the dependent variable and all
continuous independent variables. Make a scatterplot:
SPSS: Graphs Legacy Dialogs Scatter/Dot
3. Absence of outliers. This can be assessed through examining the scatter plots.
, Conducting a multiple linear regression
SPSS: Analysis Regression linear
Consider the dependent and independent variable, put them in the right boxes.
Checking various assumptions, by ticking the following boxes:
o Absence of outliers. Click on save and check: Standardised residuals,
Mahalanobis Distance and Cook’s Distance
o Absence of Multicollinearity. Click on statistics and check: Collinearity
Diagnostics
o Homoscedasticity. Click on Plots, place the variable *ZPRED (standardized
predicted values) on the X-axis, place the variable *ZRESID (standardized
residuals) on the Y-axis.
o Normally Distributed Residuals. Click on plots and check Histogram
Checking assumptions
Absence of outliers. Determine through a scatterplot or boxplot, whether there are
outliers within the data. Or look at the Residual Statistics Table and view the
Minimum and Maximum Values of the standardized residuals, the Mahalanobis
distance and the Cook’s Distance. On basis of these values, it is possible to assess
whether there are outliers in the Y-space, X-space and XY-space.
o Standardized residuals (outliers on the Y-space). Values must be between -3.3
and +3.3, otherwise it can be seen as an outlier.
o Mahalanobis Distance (outliers on the X-space). Values must be lower than 10
+ 2 x (number of IV). Values higher than this critical value indicate outliers.
o Cook’s distance (outliers on the XY-space). Indicates the overall influence of a
respondent on the model. Value must be lower than 1. Values higher than 1
indicate influential respondents (influential cases).
o Removing an outlier? Don’t include the participant if the value is theoretically
not possible or the participant does not belong to the group you want to make
inferences about. When this is not the case, run the analysis with and without
the participant, report the results of both analyses and discuss any differences.
Absence of multicollinearity. Multicollinearity refers to a situation in which more
than two explanatory variables in a multiple regression model are highly correlated (r
> .8). If you include overly related variables in your model this has three
consequences:
o The regression coefficients (B) are unreliable.
o It limits the magnitude of R (the correlation between Y and Ŷ
o The importance of individual independent variables can hardly be determined,
if at all.
Information on multicollinearity can be found in the last column of the Coefficient
table.
o Values for the Tolerance smaller than .2 indicate a potential problem
o Values for the Tolerance smaller than .1 indicate a problem.
o The variance inflation factor (VIF) is equal to 1/Tolerance. So for the VIF ,
values greater than 10 indicate a problem
Homoscedasticity. Homoscedasticity means that the spread of the residuals for an X
value must be approximately the same across all points. You can assess this by
plotting the standardized residuals against the standardized predicted values. If for
every predicted value (X-axis) there is approximately the same amount of spread
around the Y-axis, then the condition is met.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller studentutrecht. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $4.82. You're not tied to anything after your purchase.