Book notes
Chapter 1 Sampling distribution: how different could my
sample have been?
Statistical inference: estimation and null hypothesis testing
Example: We collected data from a random sample and we want to draw conclusions (make
inferences) about the population from which the sample was drawn.
● From the proportion of yellow candies in our sample bag we want to estimate a range of
values for the proportion of yellow candies in the factory’s stock (confidence interval).
● Alternatively we want to test the null hypothesis that ⅕ candies in a factory’s stock is
yellow.
○ The sample does not have a perfect image of the population. If we were to draw
another random sample, it would have different characteristics. For example; it
would have more or fewer yellow candies than the sample drawn before.
● To make an informed decision on the confidence interval or null hypothesis we have to
compare the characteristics of the sample that we drew to the characteristics of the
samples that we could have drawn.
○ The characteristics of the samples that we could have drawn = sampling
distribution.
○ Sampling distribution = the central element in estimation and null hypothesis
testing.
1.1 Statistical inference: making the most of your data
Since collecting data is expensive, we like to draw as little data as possible while still being able
to draw conclusions about a much larger set (such as the general population)
Inferential statistics: Ways of making statements about a larger set of observations from data
collected for a smaller set of observations.
● Large set of observations = population
● Smaller set of observations = sample
We want to be able to generalize a statement about the sample to a statement about the
population from which the sample was drawn
1.2 A discrete random variable: how many yellow candies are in my bag?
1.2.1 Sample statistic
Sample Statistic Example: the number of yellow candies in a bag
● Sample statistic: a value describing a characteristic of the sample
Maria Antonia Stanek
, ○ Aka: a random variable
○ It is a variable because different samples can have different scores and the value
of a variable can vary sample to sample.
○ It is a random variable because the score depends on chance, in this case, the
chance that a particular sample is drawn.
● Sampling space: all possible outcomes for the sample statistic ‘number of yellow
candies’
○ A bag of 10 candies may contain 0,1,2,3,4,5,6,7,8,9, or 10 yellow candies. The
numbers 0 to 10 ar the sampling space of the sample statistic ‘number of yellow
candies in a bag’
1.2.2 Sampling distribution
Sampling distribution: The distribution of the outcome scores of very many samples
1.2.3 Probability and probability distribution
The sampling distribution tells us all possible samples that we could have drawn. We can use the
distribution of all samples to get the probability of buying a bag with exactly five yellow candies
from the sampling distribution: we divide the number of samples with five yellow candies by the
total number of samples we have drawn
● Example: if 26 out of all 1000 samples have 5 yellow candies, the proportion of samples
with five yellow candies is = 0.026. Then, the probability of drawing a sample
with 5 yellow candies is 0.026.
● The probability distribution of the sample statistic: a sampling space with a probability
(between 0 and 1) for each outcome of the sample statistic. It tells us which outcomes we
can expect. Moreover, it tells us the probability that a particular outcome may occur.
● Discrete probability distribution: only a limited number of outcomes are possible
○ Example: probability distribution of number of yellow candies per bag of ten
candies
1.2.4 Expected value or expectation
Mean of sampling distribution = expected value of the sample statistic
If the proportion of yellow candies in the population is 0.20 (20%), we expect one out of each
fine candies to be yellow. In a bag with 10 candies, we would expect two candies to be yellow.
● One out of each five candies or population proportion times the total number of
candies in the sample = the expected value
○ Example: 0.20 * 10 = 2.0
○ 0.20 = proportion
○ 10 = total number of candies
Maria Antonia Stanek
, ○ 2.0 = expected value
Expected value: the average of the sampling distribution of a random variable
1.2.5 Unbiased estimator
A sample statistic is called an unbiased estimator of the population statistic if the expected
value (mean) is equal to the population statistic
1.2.6 Representative sample
A sample is representative of a population if variables in the sample are distributed in the same
way as in the population.
1.3 A continuous random variable: overweight and underweight
1.3.1 Continuous variable
Continuous variable: we can always think of a new value in between two values. For example:
weight.
Chapter 2 Probability models: How do I get a sampling
distribution?
2.1 The bootstrap approximation of the sampling distribution
2.1.1 Sampling with and without replacement
→ To construct a sampling distribution from bootstrap samples, the bootstrap samples must be
exactly as large as the original sample
Sampling with replacement: If we allow the same person to be chosen more than once in a
study, we sample with replacement meaning that the same person can occur more than once in
the sample. Bootstrap samples are sampled with replacement from the original sample, so one
bootstrap sample may differ from another.
→ Sampling with replacements lets us get different bootstrap samples from the original
sample, and still have bootstrap samples of the same size as the original sample.
Bootstrapping: Since sometimes collecting many samples is not possible due to money or time
limitations, we bootstrap. Bootstrapping is when you have one sample and from that sample,
more samples are drawn. Main use of bootstrapping is about having data that does not fit the
assumptions thus you bootstrap.
Sample distribution: a statistic that is arrived at through repeated sampling from a larger
population. It describes a range of possible outcomes of a statistic, such as the mean or mode of a
variable, as it truly exists in a population.
Maria Antonia Stanek
, Obtaining sampling distribution:
1. Draw one sample from the population for which we collect data
2. Draw a large sample from the initial sample (bootstrap sample)
2.1.4 Limitations to bootstrapping
→ The original sample that we draw from the population must be more or less representative of
the population. The variables of interest in the sample should be more or less distributed the
same as in the population. If it isn’t, the sampling distribution may give a distorted or incorrect
view of the true sampling distribution.
→ A sample is more likely to be representative of the population if the sample is drawn in a truly
random fashion and if the sample is large.
2.1.5 Any sample statistic can be bootstrapped
Bootstrapping is more or less the only way to get a sampling distribution for the sample
median, for example, the median weight of candies in a sample bag.
2.2 Bootstrapping in SPSS
2.2.1 SPSS Instructions
Independent samples t-test
1. Analyse
2. Compare means
3. Independent samples t-test
4. Compare x with y (in this case candy weight and candy colour)
5. Test variable: Weight, Grouping variable: colour
a. Define group
6. Press bootstrap
7. Click ‘perform bootstrapping’
8. Number of samples: 5000
9. Set seed
10. Confidence interval
a. Bias corrected
11. Run syntax
2.3 Exact approaches to the sampling distribution
2.3.1 Exact approaches for categorical data
Exact approach: Lists and counts all possible combinations (only discrete and categorical)
Uses binomial probability formula
Maria Antonia Stanek