AMDA Spring Lectures
Topic 1 – Moderation & Mediation .................................................................................... 1
Topic 1 – Lecture 1.................................................................................................................... 1
Topic 1 – Lecture 2.................................................................................................................... 7
Topic 2 – Predictive Regression ....................................................................................... 16
Topic 2 – Lecture 1.................................................................................................................. 16
Topic 2 – Lecture 2.................................................................................................................. 20
Topic 3 – Multilevel modelling......................................................................................... 24
Topic 3 – Lecture 1.................................................................................................................. 24
Topic 3 – Lecture 2.................................................................................................................. 35
Topic 3 – Lecture 3.................................................................................................................. 52
Topic 3 – Lecture 4.................................................................................................................. 61
Topic 4 – Missing Data .................................................................................................... 65
Topic 4 – Lecture 1.................................................................................................................. 65
Topic 4 – Lecture 2.................................................................................................................. 71
, Topic 1 – Moderation & Mediation
Topic 1 – Lecture 1
Moderation
With moderation and mediation, there is an independent variable (X) that predicts a
dependent variable (Y), and a third variable (Z) which qualifies (moderator) or explains
(mediator) the relationship between X and Y.
• Moderator: the X-Y relationship is different (stronger, weaker, different direction) for
different values of Z;
• Mediator: the X-Y relationships is mediated by Z, which is caused by X and in turn
causes Y. Y is the reason that the relationship exists. The effect of X on Y is an
indirect effect that goes via Z. X causes Z, Z causes Y.
Moderation is often pictured as an arrow to an arrow (from
moderator to relationship between X and Y). Although not verbally
specified in the model, the arrow from Z to Y is absolutely
necessary. The model does not work without it. ‘Z moderates the
X-Y relationship’ is equivalent to ‘X moderates Z-Y’. So, it is
symmetric. Because it is symmetric, you can choose the easiest interpretation.
Mediation (indirect effect) is pictured as a chain of arrows: from X to Z
and from Z to Y. So, the indirect effect is the part of the relationship
between X and Y that is explained by Z. However, estimation of the
direct effect is also important (X to Y).
Moderation is interaction
Example. The relationship between stress and psychological complaints, buffered by social
support. The idea is, the more stressed (X) you are, the more likely you are to experience
complaints (Y). Here, the question is, can this relationship be moderated by social support
(Z)? So, for people with low support, there is a strong relationship between stress and
complaints, but with higher levels of social support this stress-complaints relationship
becomes weaker.
The moderator is the variable that modifies the relationship between two other
variables.
Interaction in ANOVA means the difference between differences, which is indicated by
nonparallel lines. It’s always Z + Z*X + X predicts Y.
There are four cases for moderation based on the measurement levels of X, Y and Z. The
dependent variable (Y) is always interval in this course. The independent (X) and moderator
(Z) are either nominal or interval.
1
,Case 1. Regression analysis → X and Z both interval
In case 1, you start with a standard regression model for two (or more) predictors → main
effects model (the main effect of x on y, controlling for the other variable in the model). The
effect of Z and X are not depending on one another, they are just additive. Different
constants, but identical regression weights (parallel lines): the effect of X on Y is the same for
all levels of Z.
Then, an interaction is added (X*Z). We are now testing b3 = 0 to test the interaction. If it’s
0, there is no interaction. If it’s not significant, we cannot say there is an interaction; If it’s
significant, there is an interaction. This interaction does not only give different constants (b0
+ b2Z), but also different regression weights (b1 + b3Z) for each possible value of Z. The
relationship between X and Y changes as a function of the value of Z for a given individual.
If it is an interval variable, like here, there is an infinite amount of possible values of Z (in
principle). We will find a different effect of X, for every single one of those values. This is
visualised by non-parallel regression lines → which is one of the ways we identify there
is an interaction. You choose meaningful values that suit your research purposes best (most
of the time the mean and -1 and +1 standard deviation) → So, if you have no a priori
preferences, conventional Z-values are MZ – SDZ, MZ and MZ + SDZ.
It is really important that you do not start plotting where Z doesn’t exist! E.g., if you
measure stress (from 0 to 10), don’t plot Z for -10 or 20. You can find an effect, but it’s
meaningless.
Choose only regression lines within an observed range of Z-values. All regression lines
should go through point P.
Linear-by-linear interaction
Interaction via XZ term is called linear-by-linear interaction, because it describes a linear
relationship between Z and the regression weight for the X-Y relationship. Nonlinear
interactions are possible, but not in this course.
Meaning of XZ
The XZ product on its own is not the interaction. It becomes the interaction only when the
lower-order effects X and Z are also in the regression equation (and therefore are partialled
2
, out). When computing interactions, always include all relevant lower-order effects (also in
AN(C)OVA).
Centering X and Z
It is generally preferable to center X and Z (i.e. transform them into deviations from their
own mean) prior to fitting the model, for two reasons:
1. Preventing multicollineairity: the XZ product is usually highly correlated with X
and Z. Centering lowers the correlations of XZ with X and Z;
2. Interpretation of the main effects: after centering, b1 gives the effect of X on Y for
the mean of Z, which Is approximately the average effect of X on Y. The same
reasoning applies to b2 for different X-values.
So, centering makes the interpretation of the interaction and the main effects possible and
helps with multicollineairity by decreasing the correlations.
Note that you centre the main effects first, and then create an interaction of the centred main
effects. You do not make an interaction with the original main effects, and then centre that
interaction!
Now, X is 0 at it’s mean → we now interpret the effect of Z at the mean of X and the effect of
X at the mean of Z.
The interaction is tested hierarchically. You first test the effect of X and Z → Y, and then
you test X, Z, and XZ → Y. If b3 is not significant, you stick to the simpler model. Lecturer
does not agree → you should test moderation if you think there is.
Interpretation
As you can see, by the first model (main effects), 15,5% of variance is explained, and by the
second model (main effects + interaction), 20,5% is explained. Both models are significant,
and adding the third predictor increases the model significantly.
3