Definitions: underlined
Listing: in italics
Analysis or assumption: in bold
Summary Statistics
Null-hypothesis testing, statistical estimation, research ethics
Lecture 1 + tutorial 1 + Field 2, 3, 6 + Simmons et al. (2011)
Statistical models:
o Variables: are measured constructs and vary across people in the sample
o Parameters: are constant relationships that we infer from our data. They are
bits and pieces of your model that allow you to represent the invariant
relationships in your data. They act on variables. We compute the model
parameters in the sample to estimate the value in the population.
Normal distribution: two parameters mean and SD
Line: two parameters The slope and intercept
Several definitions:
o SS, s2, and SD all represent how well the mean fits the observed sample data.
Large values (relative to the scale of measurement) suggest the mean is a
poor fit of the observed scores, and small values suggest a good fit.
o Sums of Squares (SS)
Residual sum of squares (SSR) = the degree of inaccuracy
when the best model is fitted to the data. Sum of squared
errors is a ‘total’ and is, therefore, affected by the number of data
points. Squared because residuals can be negative. Df are the number
of observations.
Total sum of squares (SST) = squared difference between the observed
values and the predicted
Model sum of squares (SSM) = Improvement in accuracy of the model
fitting to the data. If large, the linear model is very different from using
the mean to predict the outcome variable. Df are the number of
predictors.
Used in R2 and F
o Variance (s2)
the ‘average’ variability but units squared
o Standard deviation (SD/s)
average variation but converted back to the original
units of measurement
Tells us how much observations in our sample differ from the mean
value within our sample.
The square root of the variance (s2)
o Standard Error (SE) (same as (SEx̅))
For a given statistic (e.g. the mean) it tells us how much variability
there is in this statistic across samples from the same population.
the width of the sampling distribution
standard error for b tells us how different b would be across samples
, o Standard Error of the mean (SEx̅)
the SD of sample means
How well the sample mean represents the population
mean
Central limit theorem: for sample of at least 30, the sampling
distribution of sample means is a normal distribution with mean (μ)
and standard deviation (σ).
The more variability in the population, the larger the SE.
The smaller your sample size, the larger the SE
o Mean sum of squares or MS = SS / df
Makes SSR and SSM comparable
o P-value:
indicates the probability of finding the current sample result or more
extreme when the null hypothesis is true.
P-values depend on sample size
Error
o Different ways to quantify this:
Sum of Squared errors (SSE) = if we add the error for each person/data
point
Mean Squared Error (MSE): When the model is the mean, de MSE is
called variance. The more cases, the larger your squared error. So you
divide it by df to get an average.
MSE = SS / df.
SD/s
From sample to population
o Sample:
Mean (x̅)
SD (s)
o Population:
Mean (μ)
SD (σ)
o Using SD and sample size, we can determine how accurate our x̅ is compared
to μ.
o Sampling distribution: Frequency distribution of sample means from the same
population
one sample will provide an estimate of the true population parameter
If the range is large, we have less confidence
It depends on the sample size + and variability (SD) of the trait in the
population
Standard Error and CI
o 95% CI: for 95% of all possible samples the population mean (μ) will be within
its limits and 5% will not contain the population mean.
o Each time we conduct a significance test we take a 5% risk at rejecting H0
falsely.
o 95% CI calculated by assuming the t-distribution as representative of the
sampling distribution.
, t-distribution looks like standard normal distribution, but fatter tails
depending on the df (here df = N - 1).
o Calculation of CI
First calculate SE
Calculate df (N – 1) and look up appropriate t-value
LL: CI = x̅ − (t × SE)
UL: CI = x̅ + (t × SE)
95% of z-scores fall between -1.96 and 1.96, so if sample means a re
normally distributed with a mean of 0 and a SD of 1, the limits of the
CI would be -1.96 and 1.96. But for small samples the distribution is
not normal, but it has a t-distribution.
So to construct for small samples a CI we use t instead of z.
o “margin of error” (t(df)×SE) of the mean is smaller in larger samples.
o When 0 is not in the CI, we can conclude that our result is significantly larger
than 0. H0 is rejected.
o The logic of CI is a way of explaining the sampling distribution, which is exactly
something we use in order to test things. You can also refrain this in terms of
the null hypothesis and alternative hypothesis, which is the same logic.
Null hypothesis significance testing: NHST
o NHST evaluates the probability of getting a statistic at least as large as the one
you have given the H0.
o H0: There is no effect.
H0: μ = 0
o H1/Ha: The experimental hypothesis
H0: μ ≠ 0
o We cannot talk about the H0 or Ha being true, we can only speak in terms of
the probability of obtaining a particular result or statistic if, hypothetically
speaking, the H0 were true.
o The test statistic is significant (p < .05): we reject our null hypothesis when we
find our sample result unlikely, when H0 would be true.
o Any value outside the 95% CI has p < .05, any value inside the 95% has p > .05
o Different types of hypotheses require different test statistics:
hypotheses concerning one or two means are tested with the t-test
hypothesis concerning several means are tested with the F-test
o Look at critical values of the t-distribution: Calculate the t-statistic compare
that observed t-value to the critical t-value The observed value is much
larger here than the critical t-value, so it’s very significant.
o When testing one sided the p-value must be divided by 2. It is however
advised against because of the increased risk at type I error.
CI and NHST
o If you want to be more confident about the location of the population mean,
you have to cover a larger area (so a larger CI, like 99%).
o When our H0 concerns one population mean, (e.g. H0: μ = 0)
NHST = one-sample t-test.
o When our H0 concerns the difference between two independent population
mean, (e.g. H0: μ1 – μ2 = 0),
NHST = independent-samples t-test.
, o Three guidelines (Cummin & Finch) for the relationship between CI and NHST:
1. 95% CIs that just about touch end-to-end represent a p-value for testing
H0: µ1 = µ2 of approximately .01
If two 95% CI’s of group means do not overlap, H0: µ1 = µ2 can
be rejected with p < .01. We say that it is highly unlikely that the
two means come from the same population.
When an experimental manipulation is successful, we expect to
find that our samples have come from different populations. If the
manipulation is unsuccessful, then we expect to find that the
sample came from the same population.
2. If there is a gap between the upper limit of one 95% CI and the lower limit
of another then p < .01.
3. A p-value of .05 is represented by moderate overlap between the bars
(approximately half the value of the margin of error).
When the two CIs would overlap more than half the (average)
margin of error (i.e. distance from the mean to the upper or lower
limit), we would not reject H0: µ1 = µ2.
Effect Size
o p-values just tell you something about probability.
o Significance ≠ importance
To quantify importance: effect size
o Effect size = magnitude of an observed effect
o An effect size is a standardized measure of the size of an effect:
Standardized = comparable across studies
Not (as) reliant on the sample size
o Cohen’s d: if we’re looking at differences between groups (and sometimes
within groups)
o Pearson’s r or R-squared: if we’re looking at continuous variables,
correlations. (Or between one continuous and a categorical variable
containing 2 categories)
o (partial) eta-squared: multiple variables in our analysis. It looks a lot like R
o Odds ratio: 2 or more categorical variables
o Rules of thumb for interpreting ES
r = .1, d = .2 (small effect)
the effect explains 1% of the total variance.
r = .3, d = .5 (medium effect)
the effect accounts for 9% of the total variance.
r = .5, d = .8 (large effect)
the effect accounts for 25% of the variance.
o Effect sizes are standardized based on the standard deviation, whereas test
statistics divide the raw effect by the standard error.
Thus, small effects can be statistically significant as long as the sample
is large. As a consequence, statistically significant effects are not
always practically relevant.
It is recommended to report p-values, CI’s and effect size, because the
three measures provide complementary information.
Type I and type II errors