LITERATUUR WEEK 3
FIELD
CHAPTER 2 THE SPINE OF STATISTICS
2.9.6 TYPE I AND TYPE II ERRORS
When we use test statistics to tell us about the true state of the world, we’re trying to see whether there is an
effect in our population. There are two possibilities. There is , in reality , an effect in the population or there is ,
in reality , no effect in the population. We have no way of knowing which of these possibilities is true. However
we can look at test statistics and their associated probability to help us to decide which of the two is more
likely. There are two mistakes we can make: a type I error and a type II error.
A type I error occurs when we believe that there is a genuine effect in our population, when in fact there isn’t.
The opposite is a type II error, which occurs when we believe that there is no effect in the population when, in
reality, there is.
There is a trade-off between these two errors: if we lower the probability of accepting an effect as genuine
( making alpha smaller) then we increase the probability that we’ll reject an effect that does genuinely exist.
2.9.7 INFLATED ERROR RATES
As we have seen, if a test uses a 0.05 level of significance then the chances of making a type I error are only
5%. Logically, then the probability of no type I error is 95% for each test. If you do three tests and we assume
that each test is independent, then the overall probability of no type I errors will be 0.95 3 = 0.95 x 0.95 x 0.95 =
0.857. Given that the probability of no type I errors is 0.857, then the probability of making at least one type I
error is this number subtracted from 1 1-0.857 = 0.143 or 14.3%. Therefore, across this group of tests the
probability of making a type I error has increased from 5% to 14.3%, a value greater than the criterion that is
typically used. This is called the familywise error rate. This can be calculated with the equation:
Familywise error = 1-0.95n
N is the number of tests carried out on the data. To combat this build-up of errors, we can adjust the level of
significance for individual tests such that the overall type I error rate ( alpha ) across all comparisons remains
0.05. The most popular way is to divide alpha by the number of comparisons k :
CHAPTER 12 GLM 1 :COMPARING SEVERAL INDEPENDENT MEANS
12.2 USING A LINEAR MODEL TO COMPARE SEVERAL MEANS
We saw in chapter 10 that if we include a predictor variable containing two categories into the linear model
then the resulting b for that predictor compares the difference between the mean score for the two categories.
We also saw in chapter 11 that if we want to include a categorical predictor that contains more than two
categories, this can be achieved by recoding that variable into several categorical predictors each of which has
only two categories ( dummy coding ). When we do , the bs for predictors represent differences between
means. Therefore, if we’re interested in comparing more than two means we can use the linear model to do
this. We test the overall fit of a linear model with an F-statistic, we can do the same here. We first use an F to
test whether we significantly predict the outcome variably by using group means and then use the specific
model parameters ( the bs ) to tell us which means differ from which. This chapter will develop what we
discovered in chapter 10 and 11 about using dummy variables in the linear model to compare means.
,Let’s start with an example. Puppy therapy rooms have been set up
to de-stress students and staff at the university of Sussex along with
universities in Bristol, Nottingham, Aberdeen and Lancaster. Despite
the increase in puppies on campuses to reduce stress, the evidence
base is pretty mixed. Imagine we want to contribute to this
literature by running a study in which we randomized people into
three groups: 1. Control group ( treatment as usual, no treatment or
placebo ) 2. 15 minutes of puppy therapy ( low-dose ) and 3. 30
minutes of puppy therapy ( high-dose group). The dependent variable was a measure of happiness ranging
from 0 to 10. We’d predict that any form of puppy therapy should be better than the control ( higher happiness
scores ) but also formulate a dose-response hypothesis that as exposure time increases happiness will increase
too. If we want to predict happiness from group membership we can use the general equation:
Outcomei = (model) + errori
We’ve seen that with two groups we can replace them by dummy variables ( 1 and 0 ) and an associated b -
value would represent the difference between the group
means. We have three groups , but we’ve also seen that this
situation is easily incorporated into the linear model by
including two dummy variables ( each assigned to a b-value )
and that any number of groups can be included by extending
the
number of dummy variables to one less than the number of
groups. We’ve also learnt that when we use dummy
variables we assign one group as the baseline ( 0 ). The
baseline category should be the condition against which
you intend to compare the other groups. In the puppy
therapy example, we can take the control group ( who
receive no puppy therapy ) as the baseline category because
we want to compare the 15 and 30 minute groups to this
group. Let’s call the 30 minutes long and the other short ( 15)
the model as predictors gives:
Happinessi = b0 + b1longi + b2shorti + εi
the baseline category is coded 0 . if a participant received 30 minutes of puppy therapy they are coded with a 1
for the long dummy variable and 0 for short. If the participant received 15 minutes of puppy therapy they are
coded with 1 for short and 0 for long.
Let’s first examine the model for the control group. Both the long and short dummy variables are coded 0
therefore we ignore the error term. The model becomes:
The 15 and 30 minute groups have dropped out of the model ( coded 0 ) and we’re left with b 0. The predicted
value of happiness will be the mean of the control group so we can replace happiness with this
value. This leaves us with that b0 in the model is always the mean of the baseline category. For the 30 minute
group, the value of the dummy variable is 1 and the value for short is 0 . the model becomes;
, Which tells us that predicted happiness for someone in the 30 minute group is the sum of b 0 and the b for the
dummy variable long (b1). We know that b0 is the mean of the control group and the predicted value of
happiness for someone in the 30 minute group is the mean of that group , therefore we can
replace b0 with the mean of the control and happiness with the mean of the 30 minutes group. The result is:
Which shows that the b-value for the dummy variable representing the 30 minute group is the difference
between the means of that group and the control. Finally we do this for the 15 minute group. The short is value
1 and the long is value 0.
Again, we replace bo by the mean of the control. The predicted value of happiness for someone in the 15
minute group is the mean of that group so we can replace happiness with , the result is:
Which shows us that the b-value for the dummy variable representing the 15 minute group is the difference
between means for the 15 minute group and the control.
The output from ANOVA:
The F(2,12) = 5,12 , p=0.025 , gives that our
model represents the group means. This F tells
us that using group means to predict
happiness scores is significantly better than
using the mean of all scores. It says: the group
means are significantly different.
The constant (b0) is equal to the mean of the
base category ( control group ) = 2.2. the b-
value of the first dummy variable b1 is equal to the difference between the means of the 30 minute group and
the control group ( 5.0-2.2 = 2.8). finally the b-value for the second dummy variable (b 2) is equal to the
difference between the mean of the 15 minute group and the control ( 3.2-2.2 = 1). The difference between
the 30 minute group and the control group is significant p=0.008, the difference between the 15 minute and
the control is not p=0.282.
12.1.1 LOGIC OF THE F-STATISTIC
The F-statistic tests the overall fit of a linear
model to a set of observed data. F is the ratio of