COMBINED SUMMARY:
MAIN CONCEPTS, LECTURE CONCEPTS, AND THEORY FROM PRACTICAL EXERCISES
Q&A: Last Tutorial Meeting
DESIGNS
Multivariate (MANOVA) => refers to the number of dependent variables; includes several
DVs
Mixed ANOVA => two types of factors; within- and between-subjects
Factorial designs => variables with distinct categories; different nominal levels
3x2 factorial design => e.g., three different time points (within), and two different
categories (between)
FACTORIAL DESIGNS
Effects in a 2x2x2 mixed ANOVA:
3x main effects
3x two-way interactions
1x three-way interaction
MAIN EFFECTS:
Ho: there is no difference between the groups
Ha: there is a difference between the groups
TWO-WAY INTERACTIONS
(i.e., Ho: use the words Diff erent and Same )
Ho: the mean differences between the groups are the same (= 0)
E.g., the mean RT difference between the control and experimental group for
condition 1 is the same as the mean RT difference between the control and
experimental group on condition 2
OR
(µgroup1, condition1 – µgroup1, condition2) = (µgroup2, condition1 – µgroup2, condition2)
The difference between condition 1 and condition 2 is the same for the experimental group
and the control group
- Regardless of the effect of Group (or Condition) => the effect on the outcome will
be the same (i.e., the slopes are the same)
(i.e., Ha: use the words Diff erent and Diff erent )
Ha: the differences between the groups will be different (≠ 0)
, E.g., the difference between condition 1 and condition 2 is different for the
experimental group and the control group
E.g., the mean RT difference between the control and experimental groups in
condition 1 is different than the mean RT difference between the control and
experimental groups on condition 2
There is a difference in effect between the groups
THREE-WAY INTERACTIONS
Ho: the mean RT difference for the experimental group on the Experimental task 1 using
Condition 1 is the same as the mean RT difference for the control group on the Experimental
task 1 using Condition 1
Ha: the mean difference for … is different than the mean difference for …
NO INTERACTION
There are two factors influencing an outcome – but their influence is independent on the
other (i.e., they do not affect each other)
E.g., what you (1) eat has nothing to do with your genes – and (2) your genes have
nothing to do with what u eat => but they still influence your weight independently
If they both influenced each other => interaction effect
- And their influence on each other also influences outcome (e.g., weight)
E.g., a lamp going on and off
(1) Power => predictor 1
(2) Switch (on or off) => predictor 2
Whether lamp is on or off => depends on their combined mechanism
You have power, and switch is on => their effects on lamp is dependent on them both
working together
LEVENE’S TEST VS SPHERICITY
Levene’s test => between subjects only
Sphericity => within-subjects only
DUMMY VARIABLES AND B-COEFFICIENTS
Dummy variables => use unstandardized coefficients to interpret
With dummies => the b coefficient reflects the difference between specific group and
the reference group
Continuous variables => use standardized coefficients to interpret
T-VALUES AND DEGREES OF FREEDOM
,t-value of 2 => enough to reject Ho at .05
If t-value > 2 or t-value < -2 then it is significant at .05
The negative/positive sigh of the t-value => only tells us the direction of the
relationship; nothing about its significance
The larger the df => the larger the sample size => the smaller the p-value (i.e., significance)
The larger the t-value => the smaller the p-value
The larger the sample size => the smaller the necessary t-value for reaching significance
The larger the df => the more impressive the F-value (or t-value) is
MODERATION
Moderation (interaction effect) => the difference between different levels
The main effect => the difference on average over both levels
Week 1
SAMPLES
Data is collected from a small subset of the population (i.e., a sample) and used to infer
something about the population as a whole
Samples are used as one is interested in populations – but cannot collect data from every
human being in a given population
MEAN
The mean is a simple statistical model of the center of a distribution of scores – it is a
hypothetical estimate of the typical score
Variance – or standard deviation – is used to infer how accurately the mean represents the
given data
The standard deviation – is a measure of how much error there is associated with the mean
The smaller the SD – the more accurately the mean represents the data
The mean =
∑ of observed scores
total number of scores
STANDARD DEVIATION VS STANDARD ERROR
The standard deviation => how much the observations in a given sample differ from the
mean value within that sample; how well the sample mean represents the sample
The standard error => how well the sample mean represents the population mean
, It is the SD of the sampling distribution of a statistic
For a given statistic (e.g., the mean) => the standard error represents how much variability
there is in this statistic across samples from the same population
Large values => indicate that a statistic from a given sample may not be an accurate
reflection of the population form which the sample came
GOODNESS OF FIT
The SS, variance, and SD – all reflect how well the mean fits the observed sample data
Large values (relative to the scale of measurement) => suggest the mean is a poor fit
of the observed scores
Small values => suggest a good fit
They are all measures of dispersion – with large values indicating a spread-out distribution of
scores – and small values showing a more tightly packed distribution
These measures all represent the same thing – but SS = sum of (x i−x )2
differ in how they express it:
The SS => is a total and therefore, it is 2
s=
∑ 2
of ( xi −x) SS
=
affected by the number of data points N−1 df
The variance => is the average variability –
√
but units squared
The SD => is the average variation – but
converted back into the original units of
s= √ s =
2 ∑ of (xi −x)2 =
N−1 √ SS
N −1
measurement
- The size of the SD can be compared to the s
SEmean=
mean – as they are in the same units of √N
measurement
CONFIDENCE INTERVAL
A 95% confidence interval => an interval constructed such that 95% of samples will contain
the true population value within the CI limits
Large samples => smaller SE => less variability => more reliable
CI =X ±(t n−1 × SE)
The relationship between CIs and null hypothesis testing:
95% CIs that just about touch end-to-end represent a p-value for testing Ho: µ1 = µ2
of approximately .01
If there is a gap between the upper limit of one 95% CI and the lower limit of another
95% CI then p < .01
A p-value of .05 is represented by moderate overlap between the bars (approx. half
the value of the margin of error)
As the sample gets smaller => the SD gets larger => the margin of error of the sample mean
gets larger
, - The CIs would widen and could potentially overlap
- When two CIs overlap more than half the average margin of error (i.e., distance
from the mean to the upper or lower limit) – do not reject Ho
MARGIN OF ERROR
The margin of error => t(df) x SE
It is the distance from the mean to the upper or lower CI limit
TEST STATISTIC
A test statistic => is a statistic for which the frequency of particular values is known
The observed value of such a statistic – is used to (1) test hypotheses, or (2) establish whether
a model is a reasonable representation of what is happening in the population
A test statistic is the ratio of variance explained by the model (effect; df = k) and variance not
explained by the model (error; df = N – k – 1)
TYPE I AND TYPE II ERRORS
A Type I error occurs when one concludes there is a genuine effect in the population – when
there really isn’t one (Ho is true)
A Type II error occurs when one concludes that there is no effect in the population – when
there really is one (Ha is true)
There is a trade-off between both errors:
Lowering the Type I error risk (=> alpha) – lowers the probability of detecting a
genuine effect => increasing Type II error risk
Ho is True Ho is False
Ho reject Type I Error Correct
(false positive) (true positive)
Probability = α =1-β
Ho accept Correct Type II Error
(true negative) (false negative)
=1-α Probability = β
In general – type I errors (false positives) are considered more undesirable than Type II errors
(false negatives) => because the real and ethical costs of implementing a new treatment or
changing policy based on false effects are higher than the costs of incorrectly accepting the
current treatment or policy
EFFECT SIZE
,An effect size => an objective and standardized measure of the magnitude of an observed
effect
Measures include Cohen’s d, Pearson’s correlations coefficient r and η²
An important advantage of effect sizes is that they are not directly affected by sample size
- In contrast, p-values get smaller (for a given ES) as the sample size increases
NB: Effect sizes are standardized based on the SD (e.g., Cohen’s d expresses the difference
between two group means in units SD) – whereas, test statistics divide the raw effect by the
SE
Small effects can be statistically significant – as long as the sample size is large
Statistically significant effects are not always practically relevant
It is recommended to report p-values, CI’s and effect size => the three measures
provide complementary information
POWER
Power => the probability that a test will detect an effect of a particular size (a value of 0.8 is
a good level to aim for)
RESEARCHER DEGREES OF FREEDOM
Researcher dfs => the flexibility of researchers in various aspects of data-collection, data-
analysis and reporting results
The false-positive rates exceed the fixed level of 5% in case of flexibility in:
(1) Choosing among dependent variables
(2) Choosing sample size
(3) Using covariates
(4) Reporting subsets of experimental conditions
Multiple testing => results in an inflated Type I error risk
E.g., carrying out 5 significance tests => results in overall false-positive risk of
1−(.95)5 = .23
The overall risk becomes 23% instead of 5%
STANDARDIZED AND UNSTANDARDIZED REGRESSION COEFFICIENTS
Unstandardized regression coefficients (b-values) => refer to the unstandardized variables
Standardized regression coefficients (beta-values) => refer to the standardized variables (i.e.,
z-scores)
The number of SDs by which the outcome will change as a result of one SD change in
the predictor
, They are all measured on the same scale => they are comparable and can be used to
judge the relative contribution of each predictor in explaining the DV given the
predictors that are included in the regression equation
When a new predictor is added to the regression model => all weights may change,
thus the relative contributions may change too
Need to know the SDs of all variables in order to interpret beta-values literally
R-SQUARED
R-squared can be derived from the degree to which the points (in a scatter plot depicting
observed vs predicted values) lie on a straight line
SSmodel
R-squared =
SStotal
F-STATISTIC
MSmodel
F=
MSresidual
OUTLIER
An outlier => a score very different from the rest of the data; it can bias the model being
fitted to the data
Check for outliers by looking at (1) residual statistics and (2) influence statistics
- Need both types of statistics, as an outlier does not necessarily show both a large
residual and a deviant influential value
RESIDUAL STATISTICS
These statistics provide information about the impact each case has on the model’s ability to
predict all cases => i.e., how each case influences the model as a whole
Rules of thumb for identifying outliers that may be cause for concern:
Cook’s distance => a general measure of influence of a point on the values of the
regression coefficients; Cook’s distances > 1 may be cause for concern
Leverage => an observation with an outlying value on a predictor variable is called a
point with high leverage; points with high leverage can have a large effect on the
estimate of regression coefficients
Mahalanobis distance => closely related to the leverage statistic, but has a different
scale; it indicates the distance of cases form the means of the predictor variables;
influential cases have values > 25 in large samples, values > 15 in smaller samples,
and values > 11 in small samples
, INFLUENCE STATISTICS
More specific measures of influence => for each case, assess how the regression coefficient is
changed by including that case
DFB0 and DFB1 => indicate the difference in the regression coefficients bo and b1
between the model for the complete sample and the model when a particular case is
deleted
- They are dependent on the unit of measurement of all variables
DFF => indicates the difference between the current predicted value for a particular
case and the predicted value for this case based on the model based on the rest of the
cases
MULTICOLLINEARITY
Refers to the correlations between predictors – there are three different ways to detect it:
1. Correlations between predictors that are higher than .80 or .90
2. VIF of a predictor > 10
3. Tolerance of a predictor < .10
Multicollinearity can cause several problems:
It affects the value of b-slopes; b can become negative for predictors with positive
correlations to the outcome
It limits the size of R-squared; adding new (but correlated) predictors does not
improve the explained proportion of variance
It makes it difficult to determine the importance of predictors
Lecture 1
STATISTICAL MODELS
In statistics, we fit models to the data => a statistical model is used to represent what is
happening in the real world
Models consist of parameters and variables:
1. Variables => measured constructs (e.g., fatigue) and vary across people in the sample
2. Parameters => estimated from the data and represent constant relations between
variables in the model
Model parameters are computed in the sample => to estimate the value in the population
THE MEAN AS SIMPLE MODEL
The mean is a model of what happens in the real world – it is the typical score