This is a summary/overview of the most important topics that were discussed during the course 'Applied Multivariate Data Analysis'. It is relevant for all Psychology Masters, since the information/literature is the same. It is 26 pages.
Important Analyses
Linear Regression – a linear regression is a way of predicting values of one variable from
another based on a model that describes a straight line. This line summarizes the pattern of
the data best.
- R2 – explained variance of the model, proportion of variance in the outcome variable
that is shared by the predictor variable
- F – ratio of how much variability the model can explain relative to how much it can’t
explain
- b-value – the gradient of the line and the strength of the relationship between a
predictor and the outcome variable
b0 = intercept, the value of the outcome variable we would predict if the
predictor value would be 0
b-coefficients vs. beta-coefficients
- b = change in outcome is associated with a unit change in the predictor
- beta = the same as b-value, but expressed as standad deviations. Thus, because
these values are standardized we can compare them across studies or multiple
predictors when you have a multiple regression
How good is the model?
- If the regression model can predict something, it will be more steep than the flat line
that would be the mean of all people on the dependent variable
- If the F-value is greater than 1, it means the model can explain some variance
F = 100: there is a 100 times more explained variance than unexplained variance
F = 1: explained and unexplained variance is the same
- In order to check how well the model fits the data, we check multiple things:
Standardized residuals/residual distance – for cases with a large prediction error
Distance from the individual points to the regression line (the model)
Influential cases that might bias the regression model do not have large
residuals per se > why we also check for other distances
Mahalanobis distance – for outlying cases on the predictor
Distance that the individual point is removed from the other points in the
space of the independent variables (thus, on the x-axis)
Cook’s distance – for unfluential cases, measures the influence of a single case on
the model as a whole
How much does the regression slope shift due to inclusion of this outlier
,General rules to see if there is an outlier based on standardized residuals:
1. Standardized residuals with an absolute greater value than 3.29 (approximately 3) is
cause for concern
2. If more than 1% of the sample cases have a residual above 2.58 (approximately 2.5) it
is cause for concern
3. If more than 5% of the sample cases have a residual above 1.96 (approximately 2) it is
cause for concern
General rules to see if there is an outlier based on the Mahalanobis distance:
1. Influential cases have values above 25 in large samples (500 or more)
2. Influential cases have values above 15 in smaller samples (100)
3. Influential cases have values above 11 in small samples (30 or less)
Multiple regression – this is the same as a simple linear regression, but with multiple
predictors.
- Ideally, all predictors have a high correlation with the outcome variable but the
correlations among the predictors is low. The higher the correlation among
predictors, the less information each predictor adds uniquely
- When the correlation among predictors is high, it causes multicollinearity: this
means that the variables basically explain the same variance (at least for a large
part). SPSS automatically corrects for this, which can cause changes between the
regression coefficient and the correlations (e.g. there is a positive correlation yet the
regression coefficient is negative). This is called bouncing betas
- Ways to detect multicollinearity:
1. Correlations between predictors is higher than .80
2. VIF of a predictor > 10
3. Tolerance of a predictor < .10
- Apart from bouncing betas, multicollinearity also causes other problems, namely, a
limited size of R given the number of predictors (adding a predictor with little unique
contribution) and difficulties with determining the importance of predictors (refers to
bouncing betas)
Assumptions Regression Analysis
1. Linearity – the relationship between the predictor and the outcome variable must be
linear
Check 1) residual plot with Zpred. X vs. Zresid. Y or 2) scatterplot with predictor X
vs. dependent variable Y
If the residuals show a curved pattern, the regression model is not optimal >
assumption is not met
2. Homoscedasticity / homogeneity of variance – for each value of the predictors, the
variance of the residuals should be equal (or: spread of outcome scores is roughly
equal at different points in the predictor variable)
Check the residual plot with Zpred. X vs. Zresid. Y
The residuals should al be equally centered around 0, with generally an equal
amount of residuals an all sides (left, right, under and above). If this is not the
case, we call it heteroscedasticity
, If the residuals increase with the predicted values, the heteroscedasticity may be
explained with another predictor
3. Normally distributed errors – if the errors are not normally distributed, we cannot
trust the –values of the significance tests (with small N)
Check 1) histogram of the residuals for multiple peaks or outliers or 2) scatterplot
with Zpred. X and Zresid. Y for the normal curve or 3) Q-Q plots
4. Independence of errors – all values of the outcome variable should come from a
different person
Error terms of observations should be uncorrelated
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller nienkevermaat. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.44. You're not tied to anything after your purchase.