Lecture 1
Field—Chapter 2
What do we do with statistics?
What do we use it for?
- Doing statistics is line painting a landscape with mathematical symbols
Try to describe what we think reality looks like
- Linear models and normal distribution are the one that is mostly used.
There are some other models but they are not used and not discussed in
this course
Description written down as a statistical model.
Statistical models
- In statistics we fit models to out data: we use a statistical model to represent what is happening in the real world
- Models consist of parameters and variables
o Variables→ measured constructs (fatigue) and vary across people in the sample
o Parameters→ estimated from the data and represent constant relations between variables in the model
- We compute the model parameters in the sample to estimate the value in the population
- In linear regression the parameter would be: the slope (inclination) and the intercept (height) and the variable would be the X variable
- If I asked you to guess the length of someone in a certain classroom, and gave you absolutely no further information… what number
would you give me?
o Predict the mean height (about 1.70m)
o Something more such as gender, then I might predict a slight difference (men higher)
o Mean is quite a good model to predict things
- Example
o Effect of CBT in severely fatigued disease-free cancer patients
o RCT: intervention versus waiting list control, pre and post measurement of primary outcome measure: fatigue severity (CIS)
o Results: patients in intervention condition reported a significantly greater decrease in fatigue severity than patients in the waiting
list
o The mean as a simple model
▪ Suppose we are interested in summarizing the therapy effect in the treatment group
▪ Compute “improvement fatigue” scores
▪ Mean improvement—model for the true effect of CBT in the treatment group
▪ We could write our model equation as:
Some kind of error because people are different. One parameter which is the
mean
▪ Where parameter b is estimated by:
Take all different scores add them up and divide by the number
- Model fit
o The mean is a model of what happens in
the real world: the typical score
o It is not a perfect representation of the
data
o How can we assess how well the mean
represents reality?
o Fit- degree that a value summarizes all
the variability
- Calculating the error
o The mean is the value from which the
(squared) scores deviate least (it has least error)
o Measures that summarize how well the mean represents the sample data:
o Sum of squared errors (if you want to describe something)—take all these differences between people and the mean, square them
and sum them together
o Mean squared error/ variance—take the value above and average it by the number of cases
o Standard deviation—take the square root of that
1. Sum of squared errors—you take the deviations, you square each one of them, then add them and you get a 10 which is the
sum…
1
, 2. The mean squared error
o Total dispersion depends on sample size→ more informative to compute the
average dispersion: the mean of squared errors (MSE)
o If you have 100 cases the sum of squares is much larger than if you only
have 10 cases
o We average by dividing by the degrees of freedom (N-1) because we use sample data to estimate the model fit in the population
o We lose one degree of freedom because we estimate the population mean with the sample mean
o It’s the error that you have on average when describing your data for each person
o
- The mean as a model: variance as simple measure of model fit
o General principle of model fit: sum (SSE) or average (MSE) the squared deviations from
the model,
▪ Larger values indicate a lack of fit
▪ When the model is the mean, the MSE is called variance
o The squared root of the variance (s2) is called standard deviation
o Intuitively, more appealing interpretation: average deviation from the mean, not in square units.
- Standard deviation and shape of a sample distribution
o two distributions with the same mean
but large and small deviations
o Standard deviation and shape of the
standard normal distribution
o They normally happen in nature,
normally distribution
o Its not just a bell shape curve, but a
specific type
- From sample to population
o Mean and SD (s) are obtained from a sample, but used to estimate the mean (μ) and SD (σ) of the population.
o Suppose we are interested in summarizing the therapy effect in the treatment population.
▪ Compute “improvement in fatigue”-scores
▪ Mean improvement in the sample = model true effect of CBT in the treatment population.
▪ We can derive from the sample error how representative the sample mean is of the population mean.
o The sampling distribution
▪ One sample will provide just an estimate of the true population parameter.
▪ Depending on the variability AND sample size this estimate will be more or less precise
▪ There is a population ad that population has a true mean, which you do not know. But you will say something about this
mean based on a single sample
▪ The average discrepancy between the mean estimated between these samples is the variability of the sampling distribution
o The sampling distribution: SE
▪ When we take many samples from a population, we can make a sampling
distribution
▪ How the parameters of interest differs across the repeated process of sampling
• Distribution of sample means
• Can compute the variability of the sample means.
▪ The SD of the means of all possible samples of size N from the population is called:
Standard Error (SE) of the mean.
- Standard error of the mean
o Central limit theorem→ for sampe of at least size 30, the sampling distribution of sample means is a normal distribution with
mean μ and standard deviation • estimated from the sample by: • where s is the sample standard deviation X
o X estimated from sample by:
o Where s is the sample standard deviation
o How much our sample mean differs from the population mean
o Note; the larger N:
2
, ▪ the smaller SE (error) →
▪ the more the sample mean is representative of the population mean (the more precise our estimate is)
o for example, the mean height of 10 people is much more variable than a set of 10000 people
o In our small sample (N=5) example, the estimated standard error would be→
o We can use this SE to calculate boundaries within we believe the population mean will lie.
- Standard error and confidence intervals
o 95% CI: for 95% of all possible samples the population mean will be within its limits
o In our small sample (N = 5) example:
o 95% CI calculated by assuming the t-distribution as representative of the sampling
distribution.
▪ t-distribution looks like standard normal distribution, but fatter tails depending on the df (here df = N - 1).
o Lower limit of CI = 𝑋 (mean)− (𝑡𝑛−1 × 𝑆𝐸)
o Upper limit of CI = 𝑋 + (𝑡𝑛−1 × 𝑆𝐸)
▪ where n – 1 are the degrees of freedom, and
▪ 𝑡𝑛−1 × 𝑆𝐸 is called the margin of error
- T- distribution and confidence intervals
o Lower limit (LL) of CI = 𝑋 − (𝑡𝑛−1 × 𝑆𝐸)
o Upper limit (UL) of CI = 𝑋 + (𝑡𝑛−1 × 𝑆𝐸)
o
- How to report and interpret CI’s
o 95% corresponds to α = .05:
▪ common is psychology,
▪ 90% and 99% CI’s can also be used
▪ APA: M = 8.0; 95% CI [6.0, 10.0]
o graphical representation: error bars with bars representing “margin of error”
(i.e., 𝑡𝑛−1 × 𝑆𝐸)
o Interpretation of the mean:
▪ “M = 8.0 represents a clinically relevant difference”
▪ Interpretation of the CI (see also Cumming & Finch, 2005):
• “Our CI is a range of plausible values for μ. Values outside the CI are relatively implausible”
• The lower limit of our confidence interval (LL = 6) implies a statistically significant improvement in fatigue, but a not
clinically relevant one. The upper limit (UL = 10) implies a clinically important change.
• The margin of error is 2: we can be 95% confident that our point estimate is no more than 2
points from the true value of μ. (the smaller the margin of error the more precise our
estimate is)
• NB. Upper and lower limits of the 95% CI vary from sample to sample, just as the mean.
• In 95% of the samples μ will be captured by the CI.
• Suppose the true population effect of CBT is represented by an improvement of 6 points on
fatigue severity (CIS).
• Our 95% CI [6.0, 10.0] just captured μ
Null hypothesis significance testing: NHST
- Null hypothesis, H0
o There is no effect.
o E.g., CBT for recovered cancer patients has no effect on fatigue→ the mean “improvement score” is zero
o Notation: H0: μ = 0
o You can do this with two ways the p value or check the CI and check whether 0 is within the range
- The alternative hypothesis, H1
o AKA the experimental hypothesis
o E.g., CBT for recovered cancer patients has an effect on fatigue
o Notation: H0: μ ≠ 0
- Test statistic
o Investigate the likelihood of our sample result or a more extreme result (e.g. 𝑋ത ≥ 8) given the null hypothesis (e.g. H0: μ = 0)
▪ Transform sample result into test statistic:
3
, ▪ statistic for which the frequency of particular values is known (e.g. t, F, χ 2 ).
o When we find our sample outcome is very unlikely (i.e. p < .05) given the null hypothesis → reject our null hypothesis:
▪ the test statistic is significant (the sample is unlikely given the H0).
▪ We define “unlikely” by α, which is usually 0.05.
▪ Different types of hypotheses require different test statistics:
o Hypotheses concerning one or two means are tested with the t-test, ➢ hypothesis concerning several means are tested with the F-
test.
You can look at you T value by dividing the mean with the standard
error—you get a number an compare this with the number given on
the table with the respective degrees of freedom.
Here the T is 11.24 which is much higher tha the 2,78 given by the
table and so it is significant
SMALLER THAN 0.05 IS SIGNIFICANT!!
How to interpret NHST
- Interpretation: we reject our null hypothesis because we find our sample result unlikely when the null hypothesis would be true.
- α = .05 is common is psychology, but α = .01 is also used.
- Caution
1. Significant effect, does not mean important effect
2. Type I and type II errors
i. non-significant effect does not mean H0 is true
ii. Simplistic all-or-nothing thinking
3. p-values can vary greatly from sample to sample
Confidence intervals and NHST
- When our null hypothesis concerns the population mean,
(e.g. H0: μ = 0),
- NHST = one-sample t-test, and
- 95% CI corresponds with a two-sided t-test with α = .05:
o Any value outside the 95% CI has p .05
- The confidence interval is the whole thing, the little dot is
the estimate, upper limit and lower limit. 0 is nowhere
near the value → so we can reject the h0 with 5% alpha
level
- Same goes for 0.01→
- When our H0 concerns one population mean, (e.g. H0: μ = 0),
NHST = one-sample t-test.
o Any value outside the 95% CI has p .05
- When our H0 concerns the difference between two independent population mean, (e.g. H0: μ1 – μ2 =
0),
o NHST = independent-samples t-test.
o The amount of overlap of the 95% CIs of the two sample means, helps us infer the p-value of
the independent samples t-test.
o Upper left→ If the two CI do not overlap or they touch—the p value will be about 0.01
o Lower left→ If the two CI overlap and it’s about half—p value around 0.05
- Caution
1. Significant effect, does not mean important effect
o Significance only tells you whether you were confident to reject the null. The null just says that
there is no difference, but not whether this difference is interesting. For example an
improvement of CBT does not mean that is an important improvement for that you need effect sizes.
o Effect size
4