W1.1
CHAPTER 12.1-12.7 (FIELD) – COMPARING SEVERAL INDEPENDENT MEANS
Analysis of variance/ANOVA: The inferential method for comparing means of several groups
- Allows for a comparison of more than two groups at the same time to determine
whether a relationship exists between them
- F statistic/F-ratio: The result of the ANOVA formula – Allows for the analysis of
multiple groups of data to determine the variability between samples and within
samples
- Factors: Categorical explanatory variables in multiple regression and in ANOVA
- If groups truly differ, the between-group variability must be larger than the within-
group variability!
Why we cannot use multiple T-tests for more than two groups → Multiple comparisons
problem: Conducting multiple T-tests without adjusting for multiple comparisons increases
the chance of a Type I Error (i.e. a false positive)
ANOVA → Two or more categorical predictor variables and one continuous outcome
variable
- Variance = Comparing the between-groups variance with the within-groups variance
The means differ F-statistic will be > 1 P-value will likely be small
The means do not differ F-statistic will be < 1 P-value will likely be large
Total sum of squares: One model for all data points – Represents the total variation in the
dependent variable from its mean (= SSM + SSE), regardless of the group from which the
scores come
Model sum of squares (SSM): How much of the variation can be explained by the model –
Difference between the grand mean and the group means – How much improvement there is
when looking at the group means instead of the grand mean = Between-groups variability
n [ ( x 1−x ) + ( x 2−x ) +⋯ ( xq −x ) ]
2 2 2
- Mean square model (MSM):
g−1
Error sum of squares (SSE)/residual sum of squares (RSS): How much of the variation cannot
be explained by the model = Within-groups variability
2
∑ ( x ig −x g )
- Mean square residual (MSR):
N−g
Omnibus test: ANOVA can indicate whether there are significant difference among groups,
but it does not explicitly tell you which specific groups differ → If the ANOVA result
indicates that there are significant differences between groups, one can perform post-hoc tests
to determine which specific groups differ from each other
, ANOVA significance test:
1. Assumptions:
1. Applicable in cases of a categorical explanatory variable and a quantitative
response variable – The explanatory variable should have at least 3 groups
2. The population distribution of the response variable for the g groups are
approximately normal → Shapiro-Wilk test
3. Homogeneity of variances → Levene’s test
↓
Variations of the traditional F-statistic designed to address situations where the
assumption of homogeneity of variance is violated:
- Brown-Forsythe F
- Welch’s F
4. Independent random samples
2. Hypotheses:
- H 0 : μ1=μ2=…=μ g
- H 1 : at least two of the population means are different
3. Test statistic:
↓
MSM SSM / ⅆ f (amount of groups−1) signal
F= = =
MSE SSE / ⅆ f (total sample ¿of groups) noise
↓
MSM = [ 1
n ( x −x )2 + ( x 2−x )2 +⋯ ( xq −x )2 ]
g−1
2
∑ ( x ig −x g )
MSE =
N−g
4. P-value: 1-F.DIST(F-score; ⅆ f 1; ⅆ f 2; TRUE)
↓
Degrees of freedom:
- ⅆf 1 = g – 1
- ⅆ f 2 = N – g → N = total number of subjects
5. Conclusion: The smaller the P-value, the more unusual the sample data is, the stronger
the evidence against H 0, and the stronger the evidence in favour of H 1
Source df SS MS F P
Model ⅆf 1 M S model × ⅆ f 1 Within-groups MS model P-value
estimate MS error
Error ⅆf 2 MS error × ⅆ f 2 Between-groups
estimate
Total ⅆf 1 + ⅆf 2 Between-groups
SS + Within-
groups SS
, Dummy variable: A categorical value that takes a binary value (0 or 1) to indicate the absence
or presence of some categorical effect that may be expected to shift the outcome – Allow us to
include categorical variables into analyses, which would otherwise be difficult to include due
to their non-numeric nature
Following up a significant F-statistic by looking at model parameters, which provide
information about specific differences between means:
- Dummy coding: The simplest form of contrast coding in which one group is
designated as the reference category (typically mentioned first), and other groups are
compared to this reference (E.g.: if you have three groups (A, B and C), one might
code them as A = 0, B = 1 and C = 0, which allows one to test whether group B differs
from the reference group (i.e. group A)
- Issues with two dummy variables:
- Performing two t-tests inflates the familywise error rate
- The dummy variables might not make all the comparisons that we
want to make
- Planned contrasts: A specific type of contrast coding where one predefines the
comparisons one wants to test before conducting the analysis – Allows one to test a
limited number of comparisons that are meaningful for the research questions (E.g.: in
a study with three groups (A, B and C), one might be interested in comparing A to B
and A to C while ignoring the comparison between B and C)
- Three rules for contrast coding using planned contrasts:
1. If you have a control group, this is usually because you want to
compare it against any other groups.
2. Each contrast must compare only two ‘chunks’ of variation
3. Once a group has been singled out in a contrast it can’t be used in
another contrast – Once a piece of variance has been split from a larger
piece, it cannot be attached to any other pieces of variance, it can only
be subdivided into smaller pieces
- Five rules for assigning values to dummy variables to obtain contrasts:
1. Choose sensible contrasts
2. Groups coded with positive weights will be compared against groups
coded with negative weights – It does not matter which way round this
is done
3. If the weights for a given contrast are added up, the result should be
zero
4. If a group is not involved in a contrast, automatically assign it a weight
of zero, which will eliminate it from the contrast
5. For a given contrast, the weights assigned to the group(s) in one chunk
of variation should be equal to the number of groups in the opposite
chunk of variation
- Post hoc analysis: A statistical analysis specified after a study has been concluded and
the data collected
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller lottepeerdeman. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.94. You're not tied to anything after your purchase.