Cursus Statistiek IV belangrijkste delen samengevat + enkele belangrijke formules die niet in formularium staan. Perfect om net voor je examen te herhalen, zodat alle belangrijkste dingen nog zijn opgefrist!
Samenvatting Statistiek IV
Chapter 2: The good old one-way ANOVA
ANOVA = analysis of variance
Used to make inferences about means
Analyzing data always start with explorative analysis
IOT test = interocular trauma test (pattern in data is so obvious that no further statistical analysis is
needed)
Notation and interpretation:
- Person i in condition j
o i = 1 … mj (mj persons in condition j)
o j = 1 … a (a conditions = levels of a factor)
o Balanced (number of persons across conditions is equal) or unbalanced
Statistical inferences:
1. Models and hypotheses
Full (systemic part ie population mean muj and random deviation ie noise) = means
can differ across conditions
Reduced (condition means are all equal to each other) (nested in full model)
a. Parameter estimation (population means = unknown)
Least squares estimation (= minimizes sum of squared differences between
what is observed and what the model tells it should be)
Fitted value = best guess for an observation based on the model
Difference between yij and mu(j) = difference between an observation and
what model tells us = residual = eij (the bigger the residue, the worse the
model)
Reduced model:
Full model:
b. Sum of squares
Single number needed that expresses how large the residuals are =
minimized sum of squares = error sum of squares = residual sum of squares =
SSEred/SSEfull
SSEfull = how much variability is left unexplained under full model, variability
within conditions/groups because considering the differences between
conditions does not imply that all data within a condition are exactly the
same
Total sum of squares = SStot = measures total variation present in the data
(deviation from the observations to the grand sample average, is an index of
the total variability in the sample) = SSEred = to be explained var
One-way anova: SStot = SSEreduced
SSEred > of gelijk aan SSEfull
SSEff = SSEred – SSEfull = expresses how much we can decrease the error by
considering the different groups (or conditions) (between variance tussen
condities) (difference between variability to be explained and the
unexplained variability) (measure for explained variability) (wat is het effect v
full model?)
1
, Problems:
Problem of scaling (kwadrateren) = sum of squares only interpreted
relatively to another
Error sum of squares reduced model is always larger or equally large
than full model (full more complex more flexible residuals
smaller) H0 true difference will be small, but what is small/large?
degrees of freedom
c. Degrees of freedom = complexity of the models
Raw residuals sum to zero (without squaring)
Dfred: n-1
Dffull: n-a
General: number of observations – number of freely estimated parameters
(more parameters = smaller df)
Dfred > dffull
d. Mean squares
Sum of squares / degrees of freedom
Df SSEff = a-1 (= difference between df red and full)
e. Alternative model parameterization with effect parameters
Effect parameters have to sum to zero
Alpha = estimated effect parameter = muj – mu
2. Choice of the test statistic
o Fit of the model to the data + complexity of model
o Is the decrease in error sum of squares (or fit) of full model large enough to justify its
increase in complexity? If additional number of parameters lowers the error sum of
squares sufficiently, then yes
o SSEff: not scale invariant + model complexity not taken into account
o F statistic (fits systematic and sampling (ie random) variability)
o Teller: systematic differences between conditions + sampling variability
o Noemer: sampling variability
o Systematic difference increase, F statistic also increases
3. The sampling distribution of F under H0 and what to conclude
a. Sampling distribution
H0 is true: F distribution with a-1 and n-1 df
P value = probability, given H0, to find an equally or more extreme F value
P value is conditional, defined given H0
b. What to conclude?
The smaller p value, the more evidence against H0
c. ANOVA table:
Between groups = SSEff (treatment)
Within groups = SSEfull (residuals) (error)
Total = SStot = SSEred (ook: total sample variance . n-1)
4. Determine the size of your effect
o Reporting effect sizes is crucial! Very small effect studies, but enormous amount of
data very small p value
o = practical significance
a. Biased estimator of the proportion of variance explained: eta^2 (anova) / R^2 (regr)
= ratio of amount of explained variability over variability to be explained
2
, 0 < SSEff < SStot 0 < eta^2 < 1
BUT: biased estimator of the true proportion of variance explained
(verwachte waarde groter dan 0)
b. Unbiased estimator of the proportion of variance explained: w^2
Smaller than eta^2
BUT: can become negative (-> zero)
Preferred over eta^2
c. Remarks on effect sizes: unitless + between 0 and 1 -> what is large/small?
1% = small
6% = medium
14% = large
d. Why not use F statistics of p value as a measure of effect size?
F depends on sample size + effect size
e. Uncertainty of effect sizes
Effect sizes are statistics, so depend on sample size
CI
Chapter 3: Contrasts, be more specific!
F test: conditions differ, but which conditions? How much differ they?
Contrast = a difference in which the averages of two or more conditions are involved
o Pairwise contrast = simple difference between the averages of two conditions
o Complex contrast = difference between two elements, and one or both elements are
averages of several conditions
Contrast = linear combination of sample averages, such that the coefficients sum to zero (cj)
A single planned contrast
- Derivation of the sampling distribution of g
o Distribution of g: if yij is normally distributed, then sample average yjstreep also
normally distributed every linear combination of sample averages yjstreep also
normally distributed (= contrast)
E(g) = gamma -> g is an unbiased estimator of gamma
Var (g) = variance of the sum = sum of variances because terms are
independent variance of sample average is equal to variance of single
observation divided by number of observations into sample average
- Statistical inference for gamma
o Confidence interval for gamma
o Hypothesis test for gamma (H0: gamma = C)(H0 true -> t verdeeld onder tdffull)
o Effect size: cohens d (= difference two means divided by the estimate of the
corresponding within-group standard deviation)
Around .2 small
.5 medium
.8 large
o Street fighting statistics: if sample size large enough (df full > 30) t verdeling =
normale verdeling
CI: 2.SE(g)
Rough hypothesis test: comparing value of the absolute value of t statistic
with 2 to evaluate the significance (alpha = .05)
3
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
√ Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper evamariedelarbre. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €3,89. Je zit daarna nergens aan vast.