Manual Analysis of Variance Exam
Nick A.F. Israel, ANR 162898, U1267315
Tilburg University
Communication and Information Sciences
Analysis of Variance 2014-2015
Manual Statistics Exam
, WHEN TO CHOOSE WHICH TEST
Reliability
- A way of measuring the reliability of a scale consisting of multiple items
o Chronbach’s Alpha value should be larger than .7 for the reliability to be “good”.
Multiple regression
- A way of predicting an outcome variable with several predictor variables
o E.g., “Similarity, social attraction and proximity predict online friendship quality.”
Mediation
- When the relationship between a predictor variable and an outcome variable can be
explained by their relationship to a third variable (the mediator).
o E.g., the effect off violent video game playing on aggression can be explained by
arousal.
Moderation
- When the influence of one predictor variable on an outcome variable is dependent on
the influence of another variable
- E.g., the effect of age on emotion recognition is different for men than for women
Repeated-measures ANOVA
- One-way
o Compares several means when those means have come from the same entities (e.g.,
measuring people’s statistical ability each month over a year-long course
- Two-way
o Compares several means when there are two independent variables, and the same
entities have been used in all conditions (e.g., measuring people’s recall scores over
four trials AND for two different conditions)
,Multiple Regression
A way of predicting an outcome variable with several predictor variables (E.g., “Similarity, social
attraction and proximity (do not) predict online friendship quality.”
Relevant assumptions
Normally distributed RESIDUALS
Checked by calculating z-scores of skewness and kurtosis
- If this value is larger than 1.96 or -1.96 the data is not normally distributed
Checked by doing a Kolmogorov-Smirnov test
- If significant then the residuals are not normally distributed
Checked by histogram
- Bell shape
If not: report bootstrapped confidence intervals of the regression coefficients.
Do not forget to unselect any ‘Safe’ options.
Homoscedasticity: ZRESID vs. ZPRED
Checked by observing scatterplot
- A random distribution of dots means no problems with homoscedasticity and that the
model generalizes well to the population; it can be used to make predictions
If not: report that the model does not generalize well to the population.
Multicolinearity
When two (independent/predictor) variables are highly correlated, they are basically
measuring the same thing.
- VIF has to be less than 10 and Tolerance has to be larger than 0.2
Residuals outside the range of -2 to 2 and -3 to 3
- -2 to 2: fewer than 5%
- -3 to 3: fewer than 1%
o Divide the amount of cases by N (which can be found in the first table: descriptive
statistics)
Durbin-Watson test
Durbin-Watson value should be around 2 (means that the errors are independent)
- Values below 1 and above 3 should raise alarm bells: report that errors are not
independent and that the model does not generalize well to the population.
Cooks distance/ Mahalanobis distance
Largest Cook’s distance should be smaller than 1
Largest Mahalanobis distance should be below 25 for large samples (500) below 15 for
smaller samples (100) and below 11 for very small samples (30) with only two predictors.
,Residuals outside the range of -2 to 2 and -3 to 3
-2 to 2: fewer than 5%
-3 to 3: fewer than 1%
- Divide the amount of cases by N (which can be found in the first table: descriptive
statistics)
Things to look out for
The individual contribution of variables to the regression model can be found in the
Coefficients table from SPSS (if you have done a hierarchical regression then look at the
values for the final model)
For each predictor variable, you can see if it has made a significant contribution to predicting
the outcome by looking at the column labelled Sig. (values less than .05 are significant).
The standardized beta values (β) tell you the importance of each predictor (bigger absolute
value = more important.
Reporting (in a nutshell)
Report the means and correlations among predictors (in a table)
Report the type of analysis (e.g.,, Hierarchical regression)
Describe step by step with variables you entered and how good the model is (proportion of
variance explained, comparison with the previous model)
Indicate per model which individual predictors are significant and how (the direction of the
correlation)
If assumptions are violated, mention this
Dummy variables
Regression does not afford the use categorical predictors when there are more than 2 categories.
When this is the case, these can be transformed into dummy variables that can then be entered into
the regression
Dummy variables consist of ones and zeros
You always need a comparison group (it does not matter which one)
- Therefore, the number of dummy variables in the regression is thus the number of
categories – 1
Imagine that a construct ‘proximity’ consists of 3 categories: low, medium and high proximity. We
have decided that low proximity will be the control group and will, therefore, receive a value of 0 for
the 2 remaining dummy variables.
For our first dummy variable we assign the value 1 to the first group (i.e., high proximity) that
we want to compare against the control group (i.e., low proximity). All other groups will
receive 0 for this dummy variable.
, For the second dummy variable, we assign the value 1 to the second group (i.e., medium
proximity) we want to compare against the control group (i.e., low proximity). All other
groups will receive 0 for this dummy variable.
This process is done through Transform > Recode into Different Variables and by selecting the
categorical variable (e.g., proximity) and editing its Old and New Values.
Finally, all dummy variables are to be placed into the regression analysis in the same block.
In the ANOVA table of the regression analysis then, it can be seen whether each group is significantly
different from the control group. For this a t-statistic and its significance is reported.
, Steps of the analysis
1. Get your data ready
- Check reversed items
- Check for (impossible) outliers (ascending in data view)
- Check reliability of scales (α)
- Compute mean values or other necessary variables (explore)
2. Computing the multiple regression
- First select: Analyze > Regression > Linear, and transfer the outcome and predictor
variables appropriately.
o Hierarchical regression: The researcher decides the order in which the predictors get
entered into the equation (based on past work of others). Known predictors (from
other research) should be entered into the model first in order of their importance in
predicting the outcome. Then, click on Next to inter new predictors to the model.
(this was used in the example result section below)
o Forced entry is a method in which all predictors are forced into the model
simultaneously. Like hierarchical, this method relies on theoretical reasons, but
unlike hierarchical the experimenter makes no decision about the order in which
variables are entered.
o Stepwise: Just don’t. Let theory, previous research and common sense be your
guides (not SPSS).
- These are the options you should select