Advanced Statistics MAT20306
,Symbols:
µ Population
y Response
H0 Null hypothesis
Hα Alternative hypothesis
σ Standard deviation
σ2 Variance
t Test statistic
df Degrees of freedom
CI Confidence interval
E Error margin
α Significance level / probability type I error
β Probability type II error
Δ Minimum relevant difference
π Proportion / probability
,Lecture 1: Confidence intervals and hypothesis testing
1.1 Two sample T-test
In the one sample T-test one random sample is taken, with an interest in a two population means.
1.1.1 Example
Researchers want to investigate the effectiveness of a new drug for tape worms in sheep. A random
sample of 24 sheep are randomly divided into two groups. One group receives the new drug, the
other group receives no treatment. After six months, the sheep are slaughtered and the worms are
counted.
1.1.2 Set up
The two groups/populations (µ) are compared to each other. The sheep are the experimental units.
The response is the number of tape worms. It is not possible to look at all 24 sheep, but a guess is
made about the difference between the population using one random sample of each population.
The population means are compared to each other; µ1 for the new drug, µ2 for no treatment. 6w2This
is the alternative hypothesis: Hα = µ1 - µ2 < 0. The null hypothesis will be H0 = µ1 - µ2 = 0. When H0 is
rejected, it is shown that the research hypothesis is true, i.e. we have shown that the number of tape
worms in the treated population is lower than the number of tape worms in the untreated
population.
1.1.3 Statistical model and assumptions
In the statistical model, a number of assumptions need to be made:
1. Normality
First off, it is assumed that the data comes from two normal distributions. This means that the data
follows a symmetrical graph, with an optimum.
2. Equal variance
, The variance (σ 2) is the variance of the data, differing from the mean. In the figure below, it can be
assumed that the first and second sample have unequal variance, and the second and third have an
equal variance.
The variance is the standard deviation (σ) squared. The standard deviation measures the amount of
variability, or dispersion, from the individual data values to the mean.
3. Independence
It is assumed that the responses from the 24 sheep are independent because it is a random sample
of sheep, which are randomly assigned to the groups and there is only one observation.
Because of these assumptions, we get the following statistical models:
y1 …y12~ N(µ1, σ 2) for sheep receiving the drug
y13 …y24~ N(µ2, σ 2) for sheep receiving no treatment
1.1.4 The test statistic