Problem 1: one-way between
subjects ANOVA
A One-way design means we have only 1 independent variable, we use it when we want to compare 2 or
more independent samples. In a one-way between subjects ANOVA the independent variable is
categorical, the dependent variable is quantitative.
The model looks as follows:
Y ij =μ+ α i +ε ij
Yij is the individual score of person J in group I, this score is created by 3 things:
μ is the effect of the constant factors (population mean)
α i is the effect of research factor (group/condition/treatment)
ε ij is the effect of the remaining factors (chance or individual differences) error
To obtain;
α i μi−μ , which is estimated by, Ý i−Ý
ε ij Yij – μi, which is estimated by Yij -Ý i
μ Ý
∑ ¿¿
ij
∑ ¿¿
ij
∑ ¿¿
ij
∑ ¿¿
ij
We can also rewrite the model to become: Y ij −μ=α i + ε ij
,Y ij −μ becomes a measure of dispersion, the deviation of a single person in the population. To
summarise the dispersion in the whole population; we square the difference and sum these squares for
every person in the population SUM OF SQUARES= variation.
However, in ANalysis Of VAriance we seek to explain the variation:
∑ ¿¿
ij
∑ ¿¿ is the total variation SS Total
ij
∑ ¿¿ is the explained variation SS Between Groups (the effect of α i)
ij
∑ ¿¿ is the unexplained variation SS Withing Groups
ij
Preparing for a one-way ANOVA
Hypotheses
The null hypothesis for a one-way ANOVA is: H0: μ1=μ 2=…=μ i or α 1=α 2=…=α i=0 there is no
group effect, all means are equal. The HA would then be: not all μ are equal. The ANOVA will not tell us
directly which means differ, if the null hypothesis is rejected.
Sample results
We could look at a sample and see a trend but we need to statistically determine whether the sample
means differ due to chance, error or an effect. When describing the sample results the proportion of
explained variation η2 ( eta squared), is a fitting measure (will be further discussed later on).
Assumptions
The dependent variable Y is quantitative
o Violated different test
All samples to be compared are independent of one another
o This makes it a BETWEEN SUBJECTS ANOVA
o If it is violated and the samples are dependent WITHIN SUBJECTS ANOVA
All samples have been drawn from normally distributed populations
o Normal sampling distribution of sample means
o We can do this by;
Looking at the histograms of the samples
Looking at skewness and kurtosis ( perfectly normal distribution has 0 for both,
but it is allowed to be between -1 and +1)
Looking at normality tests: 2 test assess null hypothesis that the population is
normally distributed Shapiro-Wilk (our preferred) and Kolmogorov-Smirnov.
These tests should not be rejected
o If this assumption is
, o violated perform test if the size of each problematic sample is large enough (central
limit theorem)
All samples have been drawn from populations with the same dispersion; equal population
variances
o The variance is the square of the standard deviation σ
o This assumption is made because ANOVA piles all the variances into one pool single
pooled variance= error variance or variance within groups
o Rule of thumb: largest S/ smallest S <2
o Levene’s test can also tell us if the assumption is met, it tests the null hypothesis:
σ 21=σ 22 =σ 23 , we do not want this to be rejected
If violated test can still be done if all samples are approximately equally large
One-way ANOVA in action
To recap: the Sum of Squares is the variation in the sample, the population means ( μ ¿ are estimated
using the sample means (Y´ ¿ ¿. Because of this every μ in the calculation of the SS above-mentioned, is
changed into its sample-equivalent Y
∑ ¿¿ = SS Total
ij
∑ ¿¿ = SS Between
ij
∑ ¿¿ = SS Within
ij
SS total = SS between + SS within
If we divide SSB by SST we get eta squared η2: the proportion of explained variation, where;
0,01 is a small effect
0,06 is a medium effect
0,14 is a large effect
It is very much possible that in the population eta squared is 0, meaning there is no effect, therefore the
SS does not say much by itself. To be able to use the SS’s to our advantage we need to get quantities we
can compare we divide the SS by their corresponding degrees of freedom.
There are I number of groups; I-1= the degrees of freedom for the between groups
The sample size is N, divided across I different groups; N-i= the degrees of freedom for the within
groups
The degrees of freedom for the entire sample is familiar to us; N-1
MS Between : ∑ ¿ ¿ ¿
ij
MS Within : ∑ ¿ ¿ ¿
ij
, If we divide the accompanying SS by their degrees of freedom we obtain the average square, better
known as the mean square MS this is the average variation or variance. Here we also have 3 different
mean squares: MS Between groups, MS within groups and MS total.
Differences within groups cannot be attributed to the treatment (they received the same treatment), this
difference must be caused by something else error. The variance within groups is thus an unbiased
2
estimator of the error-variance σ ε
If we have 3 different groups we are likely to obtain 3 different error variances, but these are all
estimators of the same error variance in the population. Why? the ANOVA model assumes equal
variances across populations. This is why it’s best to take the average of these 3 estimates to obtain the
2
best possible estimate of the error variance in the population: S p = the pooled variance.
The pooled variance is the same as the MSWithin MSW is the most accurate estimator of the error
2
variance. It is important to know the expected value of MSW, E MSW= σ ε , this is the mean value if MS if
we were to repeat the experiment an infinite number of times.
The expected value of MSB, will not be 0 when the null hypothesis is true, it can be something else by
chance.
To be able to compare the MSB and MSW, we divide them by each other, this way we’ll obtain the F-
ratio:
MS(Between)
MS (Within)
F is a measure of the ratio between 2 variances (the variance between the groups which we hope is due
to an effect of group, and the variance within groups due to error). If F=1, the MSB and MSW are equal
(no effect). We can define a critical area in the F-sampling distribution, Fc, this is often at 0,05 meaning
the 5% most extreme observations if H0 is true. Any F above the Fc has a chance of less than 5% of
occurring if the H0 is true.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller veracreemers. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.83. You're not tied to anything after your purchase.