Short summary ADA (0HM120)
Week 1 Descriptive and inferential statistics
Test H0=the height of the dutch population is on average 174 cm. T-test interval dependent variable
and 1 categorical predictor with 2 categories
Things in the population are described with normal distributions. We know for example how many
percentage of the population are 2 or 3 SD higher than the mean
Every normally distributed variable can be transformed into a standard normal (z) distribution using
the formula of z-scores (x-mean/sd). So you can calculate the point in the standard normal
distribution and check how big the area under the curve is larger or smaller than the z-score. If you
have the z-score you can look up in the table how much percentage (p) of the population is smaller
or larger than the found z-score. Mind that in the table everything is in – numbers, but this is the
same as + as the normal distribution is symmetrical.
The distribution of many sample means (e.g. 1000 samples) taken from large samples (n>30) is a
normal distribution, the mean of that distribution is the population mean and you can calculate the
SE (SD population/sqrt(sample size)central limits theorem. The standard deviation of the
sampling distribution is the standard error. Higher standard deviation/error is bigger spread, the
less peaky the graph. The population mean we have based on this is the best estimate, we don’t
know for sure if it is the real one. The SE gives the uncertainty. More uncertainty is due to a less
accurate procedure.
So in 95% of the cases, the single observed mean is not more than -1.96*SE or 1.96*SE away from
the real population mean. But you don’t know how accurate the individual mean you found is.
All tests assume that you know the real population SD, but you don’t know this often.
Hypothesis test:
- Comparing the predicted mean to the observed mean of the sample
- If H0 is true we have a 95% probability (p=95%) that any
sample mean is within the interval, these are the likely values.
The unlikely values are in the 5% (p=2.5% at each side) below
or above the critical values, if you find a sample mean in this
area, you reject H0, otherwise you fail to reject H0
- Type I error: given our procedure and sample size, in the long
run there is a 5% probability of rejecting H0 if H0 was indeed true, but you don’t know if this
is your study
Z-test:
- In the standard normal distribution the critical z-values for alpha of 5% are Z=-1.96 and 1.96
(2.5% at each tail), for one sided 5% Z=1.645 and Z=-1.645 (5% at one tail).
, - You can check if your observed mean is inside or outside of this z-range by transforming the
observed mean in a z-score with (mean-H0)/SE
- P values is the area under the curve larger and smaller than the found Z scores of the sample
mean. It says something about how likely the observed value is assuming H0 is true (P(the
data|H0 is true)). It also tells you what the long run probability of making incorrect decisions
(incorrectly rejecting H0) is if the observed mean would be the critical value/decision rule
(P(type I error)). It doesn’t say anything about practical relevance or about how big the
effect is. You can find the p-value for two sided by checking the % for the observed mean in
the table and multiply it with 2 (normally this p*2 is given), for one sided it is the % you find
in the table.
- In the Z-distribution the mean is 0 and the SE is 1.
If the population standard deviation is unknown/estimated, the sampling distribution of means is a
student t-distribution, not a normal distribution. The shape depends on the degrees of freedom
(bigger when bigger sample size, df=n-1)). The t-test is based on this t-distribution. Calculation of t-
score (observed mean-H0)/(SD/sqrt(n)). Check if the found t-score is within or outside the critical
values for that alpha level (if 5% look in table at 0.025) and degrees of freedom (vertical in table).
T-test:
- One sided: when you have good theoretical reasons that you will not find a value for
example below it or when you don’t care about the smaller options (example medication)
- You can find the p-value in the table again for the calculated t-score and for two-sided tests
multiply with 2 as it is 2,5% at each tail of the distribution, this gives how many out of 100
study would find that difference between the groups if H0 were true. The
p-value of a one-sided test if half of the two-sided test
Effect size: estimated difference between two groups, if scales are not
standardized use Cohen’s d (20=small effect, 50 moderate, >80 large). This doesn’t say anything
about practical relevance (in medical field a small effect can already be relevant)
Confidence interval: there is a 95% probability that the interval encloses the true population mean
(or true population difference for t-test), the size of the confidence interval says something about
the precision of your procedure. When the 0 is included in the 95% confidence interval a two sided
hypothesis test with an alpha level of 0.05 will not be statistically significant.
Bootstrapping can be used when the underlying sampling distribution is not known, many bootstrap
samples are taken from the original sample and calculate for each the mean, CI can be calculated by
sorting them from low to high. This doesn’t give you a p-value
Counter null-hypothesis: when you find an non-significant effect, there can be a difference between
groups, but there is not enough evidence to reject H0. You can test another hypotheses which can
also not be rejected, this tells you something about the quality of the evidence. If you find no
significant effect you cannot say there is no difference
Type II error: fail to reject the H0 when it is not true, if you
allow for a smaller type I error (higher power), the type II
error probability becomes bigger
Power: the long run probability to correctly reject the H0 (or
finding a significant effect), this is 1-probability of type II
error (B). Power increases with increased sample size,
, increased type I error(confidence level) and conducting studies with big effect sizes. The power
depends on sample size and procedure. You can calculate the sample size with the expected effect
size and the aimed powerpower analysis. If the power is 53%, the procedure will in the long run
only result in rejecting H0 in 53 out of 100 samples. You can also find with the power table after the
t-test how big the chance was of finding a significant difference with the estimated effect size and
sample size (n per group).
T-test assumptions: normality (sample size>30 & normal distribution dependent variable) sktest
(kurtosis(peakiness)>0 leptokurtic, kurtosis<0 platykurtic, skewness>0 longer right tail, skewness<0
longer left tail) & swilk (W>0.96), homogeneity of variances (H0=population variance equal for both
groups, violation acceptable when equal sample sizes, reject H0 when W0 significant) robar. Use
Welch t-test if violated. Last, independence of observations (cannot be tested, determined by the
design of study, random sampling needed)
Outliers t-test: boxplot check; extreme outlier 4*IQR above 75th percentile, standardized z scores
above 3 or below -3 when population normally distributed (otherwise above or below 4)
Week 2 Repeated and mixed ANOVA (analysis of variance)
Anova interval dependent variable and categorical predictors with more than 2 categories
Oneway-between subjects anova: group membership is defined by a single factor (e.g. relation
between seating location and exam scores)
- You could run pairwise t-tests, but the chance of making a type I error increases (capitalizing
on chance)
- You can do an omnibus hypothesis test (H0=all group means are equal), but this doesn’t
show which groups are different, answers the question “are there differences between
groups?”
- Grand mean is the overall average (average of all group means), everything is referenced to
the group mean. The group effect is group mean-grand mean
- Anova linear model: educational performance (x)=grand mean (u) + effect of group (φ) +
error term (not explained by group membership; individual differences)
- SStotal is the total variation, SSgroup is variation explained by group membership, SSerror is
the unexplained variance not explained by group membership but due to individual
differences (variance within groups)
- H0=there is no effect of group membership, Ha=there is some effect of group membership
- Ratio between variance due to membership and not due to membership importantthis is
the F-statistic=MSgroup(between)/MSerror(within). MS=SS/df. F(3,8)=17 means dfgroup=3,
dferror=8, MSeffect/MSerror=17. When F is bigger than the critical value on the F
distribution (shaped by df(3,8)) we can reject H0 that there is no difference between the
groups, p-value can be found in stata as well as SStotal, SSgroup, SSerror and F
- R squared an effect size for the variance explained by all factors in the model, if the group
variable is the only factor and this is a high value there is a big effect, answers how much of
the individual differences is explained by group membership?
- Look instead of R to the effect size in eta-squared n2=SSgroup/SStotal. Eta squared is equal
to R2 and the partial eta squared if there is only 1 factor in the model. When more factors
are included stata gives the partial eta squared (but it is labelled as the eta-squared), the
partial eta squared is bigger (as the error is smaller) than the real eta-squared so this can be