W1.1
CHAPTER 12.1-12.7 (FIELD) – COMPARING SEVERAL INDEPENDENT MEANS
Analysis of variance/ANOVA: The inferential method for comparing means of several groups
- Allows for a comparison of more than two groups at the same time to determine
whether a relationship exists between them
- F statistic/F-ratio: The result of the ANOVA formula – Allows for the analysis of
multiple groups of data to determine the variability between samples and within
samples
- Factors: Categorical explanatory variables in multiple regression and in ANOVA
- If groups truly differ, the between-group variability must be larger than the within-
group variability!
Why we cannot use multiple T-tests for more than two groups → Multiple comparisons
problem: Conducting multiple T-tests without adjusting for multiple comparisons increases
the chance of a Type I Error (i.e. a false positive)
ANOVA → Two or more categorical predictor variables and one continuous outcome
variable
- Variance = Comparing the between-groups variance with the within-groups variance
The means differ F-statistic will be > 1 P-value will likely be small
The means do not differ F-statistic will be < 1 P-value will likely be large
Total sum of squares: One model for all data points – Represents the total variation in the
dependent variable from its mean (= SSM + SSE), regardless of the group from which the
scores come
Model sum of squares (SSM): How much of the variation can be explained by the model –
Difference between the grand mean and the group means – How much improvement there is
when looking at the group means instead of the grand mean = Between-groups variability
n [ ( x 1−x ) + ( x 2−x ) +⋯ ( xq −x ) ]
2 2 2
- Mean square model (MSM):
g−1
Error sum of squares (SSE)/residual sum of squares (RSS): How much of the variation cannot
be explained by the model = Within-groups variability
2
∑ ( x ig −x g )
- Mean square residual (MSR):
N−g
Omnibus test: ANOVA can indicate whether there are significant difference among groups,
but it does not explicitly tell you which specific groups differ → If the ANOVA result
indicates that there are significant differences between groups, one can perform post-hoc tests
to determine which specific groups differ from each other
, ANOVA significance test:
1. Assumptions:
1. Applicable in cases of a categorical explanatory variable and a quantitative
response variable – The explanatory variable should have at least 3 groups
2. The population distribution of the response variable for the g groups are
approximately normal → Shapiro-Wilk test
3. Homogeneity of variances → Levene’s test
↓
Variations of the traditional F-statistic designed to address situations where the
assumption of homogeneity of variance is violated:
- Brown-Forsythe F
- Welch’s F
4. Independent random samples
2. Hypotheses:
- H 0 : μ1=μ2=…=μ g
- H 1 : at least two of the population means are different
3. Test statistic:
↓
MSM SSM / ⅆ f (amount of groups−1) signal
F= = =
MSE SSE / ⅆ f (total sample ¿of groups) noise
↓
MSM = [ 1
n ( x −x )2 + ( x 2−x )2 +⋯ ( xq −x )2 ]
g−1
2
∑ ( x ig −x g )
MSE =
N−g
4. P-value: 1-F.DIST(F-score; ⅆ f 1; ⅆ f 2; TRUE)
↓
Degrees of freedom:
- ⅆf 1 = g – 1
- ⅆ f 2 = N – g → N = total number of subjects
5. Conclusion: The smaller the P-value, the more unusual the sample data is, the stronger
the evidence against H 0, and the stronger the evidence in favour of H 1
Source df SS MS F P
Model ⅆf 1 M S model × ⅆ f 1 Within-groups MS model P-value
estimate MS error
Error ⅆf 2 MS error × ⅆ f 2 Between-groups
estimate
Total ⅆf 1 + ⅆf 2 Between-groups
SS + Within-
groups SS
, Dummy variable: A categorical value that takes a binary value (0 or 1) to indicate the absence
or presence of some categorical effect that may be expected to shift the outcome – Allow us to
include categorical variables into analyses, which would otherwise be difficult to include due
to their non-numeric nature
Following up a significant F-statistic by looking at model parameters, which provide
information about specific differences between means:
- Dummy coding: The simplest form of contrast coding in which one group is
designated as the reference category (typically mentioned first), and other groups are
compared to this reference (E.g.: if you have three groups (A, B and C), one might
code them as A = 0, B = 1 and C = 0, which allows one to test whether group B differs
from the reference group (i.e. group A)
- Issues with two dummy variables:
- Performing two t-tests inflates the familywise error rate
- The dummy variables might not make all the comparisons that we
want to make
- Planned contrasts: A specific type of contrast coding where one predefines the
comparisons one wants to test before conducting the analysis – Allows one to test a
limited number of comparisons that are meaningful for the research questions (E.g.: in
a study with three groups (A, B and C), one might be interested in comparing A to B
and A to C while ignoring the comparison between B and C)
- Three rules for contrast coding using planned contrasts:
1. If you have a control group, this is usually because you want to
compare it against any other groups.
2. Each contrast must compare only two ‘chunks’ of variation
3. Once a group has been singled out in a contrast it can’t be used in
another contrast – Once a piece of variance has been split from a larger
piece, it cannot be attached to any other pieces of variance, it can only
be subdivided into smaller pieces
- Five rules for assigning values to dummy variables to obtain contrasts:
1. Choose sensible contrasts
2. Groups coded with positive weights will be compared against groups
coded with negative weights – It does not matter which way round this
is done
3. If the weights for a given contrast are added up, the result should be
zero
4. If a group is not involved in a contrast, automatically assign it a weight
of zero, which will eliminate it from the contrast
5. For a given contrast, the weights assigned to the group(s) in one chunk
of variation should be equal to the number of groups in the opposite
chunk of variation
- Post hoc analysis: A statistical analysis specified after a study has been concluded and
the data collected
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper lottepeerdeman. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €7,49. Je zit daarna nergens aan vast.