Summary of the book, The Analysis of Biological Data
Chapter 15-17
Authors: Withlock – Schluter
Second edition
Chapter 15, Comparing means of more than two groups
The best solution is the analysis of variance, or ANOVA, which compares means of multiple
groups simultaneously in a single analysis.
Under the null hypothesis, the sample means Y¯i differ from each other solely because of
random sampling error. The alternative hypothesis is that the mean phase shift is not the
same in all three light-treatment populations.
H0: μ1 = μ2 =μ3.
HA: At least one μi is different from the others.
Under the null hypothesis that the true means of groups do not differ, individuals belonging
to different groups will on average be no more different from one another than individuals
belonging to the same groups. The group mean square and the error mean square should be
equal (except by chance). But if the null hypothesis is false, we expect the group mean
square to exceed the error mean square. In this case the variation among individuals
belonging to different groups is expected to be greater than the variation among subjects
belonging to the same group.
The comparison of mean squares is done with an F-ratio.
F=group mean squareerror mean square=MSgroupsMSerror.
The degrees of freedom for groups is one less than the number of groups: dfgroups= k – 1,
where k is the number of groups.
The group mean square of ANOVA represents variation among the sampled individuals
belonging to different groups. It will on average be similar to the error mean square if
population means are equal.
The error mean square of ANOVA is the pooled sample variance, a measure of the variation
among individuals within the same groups.
This F-ratio is the test statistic in analysis of variance. Under the null hypothesis, F will on
average lie close to one, differing from it only because of sampling variation in the
numerator and denominator.
, F-distribution.
To find the critical value of the F-distribution having 2 and 19 degrees of freedom, locate the
cell in the table corresponding to 2 df in the numerator and 19 df in the denominator. This
value, 3.52, is highlighted in Table 15.1-3. We write it as F0.05(1),2,19=3.52.
In this formula, “(1)” indicates that we are looking only at the right tail of the F-distribution.
In other words, the area under the curve in Figure 15.1-3 to the right of 3.52 is 0.05.
Because our observed value of F (i.e., 7.29) is larger than 3.52, it lies farther out in the right
tail of the F-distribution, so P must be less than 0.05. Therefore, we reject the null
hypothesis.
The R2 value (“R-squared”) is used in ANOVA to summarize the contribution of group
differences to the total variation in the data. The quantity is based on the fact that the total
sum of squares can be split into its two parts, the error sum of squares and the group sum
of squares: SStotal=SSerror+SSgroups.