CH2 The spine of statistics
2.2 What is the SPINE of statistics?
• Standard error
• Parameters
• Interval estimates (confidence intervals)
• Null hypothesis significance testing
• Estimation
2.3 Statistical models
The degree to which a statistical model represents the data collected is known as the fit of the
model. There are three levels of fit: good fit, moderate fit and poor fit.
Everything in statistics boils down to equation:
ANOVA = analysis of variance
2.5 P is for parameters
Statistical models are made up of variables and parameters. Parameters are not measured and are
(usually) constants believed to represent some fundamental truth about the relations between
variables in the model. Some examples of parameters are the mean and median and the correlation
and regression coefficients.
We can use the sample data to estimate what the population parameter values are likely to be.
2.5.1 The mean as a statistical model
The mean is a statistical model because it's a hypothetical value and not necessarily one that is
observed in the data.
The 'hats' are making explicit that the values underneath them are estimates. (^)
2.5.2 Assessing the fit of a model: sums of squares and variance revisited
With most statistical models we can determine whether the model represents the data well by
looking at how different the scores we observed in the data are from the values that the model
predicts. Deviance is another word for error.
To estimate the mean error in the population we need to divide by the degrees of freedom (df),
which is the number of scores used to compute the total adjusted for the fact that we're trying to
estimate the population value.
In statistical terms, the degrees of freedom relate to the number of observations that are free to
vary.
2.6 E is for estimating parameters
The principle of minimizing the sum of squared errors is known as the method of least squares or
ordinary least squares (OLS).
2.7 S is for standard error
The population mean is μ.
,The difference in samples illustrates sampling variation.
A sampling distribution is the frequency distribution of sample means from the same population.
If we take the average of all sample means we would get the value of the population mean. We can
use the sampling distribution to tell us how representative a sample is of the population.
The standard deviation of sample means is known as the standard error of the mean (SE) or standard
error. The central limit theorem tells us that as samples get large (usually defined as greater than 30),
the sampling distribution has a normal distribution with a mean equal to the population mean, and a
standard deviation shown in the next equation
When the sample is relatively small the sampling distribution is not normal: it has a different shape,
known as a t-distribution.
2.8 I is for (confidence) interval
We can use the standard error to calculate boundaries within which we believe the population value
will fall. Such boundaries are called confidence intervals.
2.8.1 Calculating confidence intervals
Rather than fixating on a single value from the sample (the point estimate), we could use an interval
estimate instead: we use our sample value as the midpoint, but set a lower and upper limit as well.
Typically, we look at 95% confidence intervals, and sometimes 99% confidence intervals, but they all
have a similar interpretation: they are limits constructed such that, for a certain percentage of
samples, the true value of the population parameter falls within the limits.
You can't make probability statements about confidence intervals. The 95% reflects a long-run
probability. The probability that it contains the population value is either 0 or 1.
Convert scores so they do have a mean of 0 and a standard deviation of 1:
We use the standard error and not the standard deviation because we're interested in the variability
of sample means, not the variability in observations within the sample.
If the interval is small, the sample mean must be very close to the true mean. Conversely, if the
confidence interval is very wide then the sample mean could be very different from the true mean,
indicating that it is a bad representation of the population.
2.8.2 Calculating other confidence intervals
In general, we could say that confidence intervals are calculated as:
,p is the probability value for the confidence interval.
2.8.3 Calculating confidence intervals in small samples
For small samples the sampling distribution is not normal - it has a t-distribution. The t-distribution is
a family of probability distributions that change shape as the sample size gets bigger. To construct a
confidence interval in a small sample we use the same principle as before, but instead of using the
value for z we use the value for t.
2.8.4 Showing confidence intervals visually
Confidence intervals provide us with information about a parameter, and, therefore, you often see
them displayed on graphs. The confidence interval is usually displayed using something called an
error bar, which looks like the letter I. An error bar can represent the standard deviation, or the
standard error, but more often than not it shows the 95% confidence interval of the mean.
By comparing the confidence intervals of different means (or other parameters) we can get some
idea about whether the means came from the same or different populations.
When the confidence intervals are not overlapping it suggests two possibilities:
• Our confidence intervals both contain the population mean, but they come from different
populations
• Both samples come from the same population but one (or both) of the confidence intervals
doesn't contain the population mean (because in 5% of the cases they don't).
2.9 N is for null hypothesis significance testing
Null hypothesis significance testing (NHST) is the most commonly taught approach to testing
research questions with statistical models. It arose out of two different approaches to the problem of
how to use data to test theories: Ronald Fisher's idea of computing probabilities to evaluate
evidence, and Jerzy Neyman and Egon Pearson's idea of competing hypotheses.
2.9.1 Fisher's p-value
The point of his study is that only when there was a very small probability that the woman could
complete her task by guessing alone would we conclude that she had a skill.
Scientists tend to use 5% as a threshold for confidence: only when there is a 5% chance (or 0.05
probability) of getting the result we have (or one more extreme) if no effect exists are we confident
enough to accept that the effect is genuine.
Fisher's basic point was that you should calculate the probability of an event and evaluate this
probability within the research context.
, 2.9.2 Types of hypothesis
In contrast to Fisher, Neyman and Pearson believed that scientific statements should be split into
testable hypotheses. The hypothesis or prediction from your theory would normally be than an
effect will be present. This hypothesis is called the alternative hypothesis and is denoted by H1. It's
sometimes called the experimental hypothesis. There is another type of hypothesis called the null
hypothesis, which is denoted by H0. This hypothesis states than an effect is absent.
The null hypothesis is useful because it gives a baseline against which to evaluate how plausible our
alternative hypothesis is.
Rather than talking about accepting or rejecting a hypothesis, we should talk about 'the chances of
obtaining the result we have (or one more extreme), assuming that the null hypothesis is true'.
Hypotheses can be directional or non-directional. A directional hypothesis states than an effect will
occur, but it also states the direction of the effect. A non-directional hypothesis states that an effect
will occur, but it doesn't state the direction of the effect.
2.9.3 The process of NHST
NHST is a blend of Fisher's idea of using the probability value p as an index of the weight of evidence
against a null hypothesis, and Jerzey Neyman and Egon Pearson's idea of testing a null hypothesis
against an alternative hypothesis.