Chapter 1: Sampling Distribution
Key Concepts
Population: a large data set from which observations are collected from.
Random sample: a method of selecting a sample from a population whereby every possible sample
has a predetermined probability of being selected.
Sample statistic: a number describing a characteristic of a sample.
Sampling distribution: all possible sample statistic values and their probabilities or probability
densities.
Sampling space: all possible sample statistic values.
Random variable: a variable with values that depend on chance.
Expected value: the mean of a probability distribution, such as a sampling distribution.
Probability: the chance that something will happen.
Probability density: a means of getting the probability that a continuous random variable (like a
sample statistic) falls within a particular range.
p value: the probability of obtaining the observed results when the null hypothesis is true. The null
hypothesis is rejected when it is true.
Unbiased estimator: a sample statistic for which the expected value equals the population value.
Learning Objectives Answered
Q: Why is statistical inference necessary for drawing conclusions about the population?
A: Generally, drawing conclusions from an entire population can be expensive and time consuming.
However, inferential statistics makes use of samples in order to make generalizable statements about
the population. Ultimately, the conclusions made within a sampling distribution can be applied and
generalized to the wider population.
Q: Why can a sample statistic be different from the population value?
A: A sample statistic can differ from the population value because a sample is only a small extraction
from the entire population. It requires a combination of many sample statistics, to calculate the mean,
in order to be equivalent to the population value.
Q: What is a random sample? And, why can random samples differ from each other even
when they are drawn from the same population?
A: A random sample is a sample with its data generated by chance. Random samples can differ from
each other as each sample can have different outcomes, or sample statistics. Once each sample is
simulated, the population parameter, or mean, can be calculated.
,Q: What does a sampling distribution represent?
A: A sampling distribution represents the outcome of scores from many different samples. It
ultimately represents the mean of the population.
Q: Why is the mean of the sampling distribution similar to the value in the population but
(mostly) different from the value in a sample?
A: It is similar to the value in the population because the mean of the sampling distribution, which is
an accumulation of many samples, is the same as the expected value in the population. It is mostly
different from the value in a single sample, because it is a simulation of many different sample statistics
from many samples.
Q: What is the difference between probability for a categorical variable and probability
density for a continuous variable?
A: Continuous sample statistics are challenging to depict within a sampling distribution as single
values because oftentimes, such as with measuring weight, height or average, there are many decimal
points that follow (f.e. 2.8, 2.81 or 2.8001, etc). As a result, it is easier to consider it in terms of a range
of values, which is denoted as probability density. Categorical variables are simply labels (which can be
in the form of numbers), and as a result are displayed with “probability.”
Chapter 2: Constructing a Sampling Distribution
Key Concepts
Bootstrapping: sampling with replacement from the original sample to create a sampling
distribution.
- Bootstrapping: samples that resemble the sampling distribution even though they are drawn
from a sample instead of the population.
- Bootstrap samples must be exactly as large as the original sample.
- Due to the requirement for a size identical to the original sample, bootstrap samples
cannot have different sizes.
- Bootstrap samples are sampled with replacement from the original sample (so this is
why they may differ from another).
- Some cases in the original sample may not be sampled for a bootstrap sample,
while other cases are sampled several times.
,Sampling with replacement: when a person is randomly chosen for a sample, and is put back into
the population to be chosen more than once for the sample.
Sampling without replacement: when a person is randomly chosen for a sample, but is not put back
into the population to be chosen for the sample again.
- Sampling without replacement: see definition above
- In practice, this is the sampling technique preferred as we never usually want a
respondent to participate twice, as this does not yield new information.
- There would be a slight difference in calculating probability, as taking one person out
of the sample will decrease the size. However, this difference, especially in large samples
is negligible.
- This is only possible if the population size is much larger than the sample size.
Exact tests/approach: calculating the true sampling distribution as the probabilities of combinations
of values on categorical variables.
- Fisher’s exact test: automatically possible if in SPSS if the contingency table has two rows and
two columns.
- Automatically occurs if conditions for theoretical probability are not met.
- Exact approach: a probability approach that lists and counts all possible
combinations, only with discrete or categorical variables.
- A proportion is based on frequencies (discrete variables, e.g. integers), so an
exact approach is used to create a sampling distribution for one proportion.
- Fisher-exact test is an example of an exact approach to the sampling
distribution of the association between two categorical variables.
- The exact approach is best for variables that have a limited number of values,
like discrete variables.
Theoretical probability distributions/approximation (normal or z, t, F chi-squared): using
theoretical probability distribution as an approximation of the sampling distribution.
- Known or likely to have the same shape as the true sampling distributions under particular
circumstances or conditions.
- Sample size is important: the larger, the better.
- When conditions for the theoretical approach are not met, bootstrapping or an exact approach
must be selected.
, - Normal distribution or z: a symmetrical, reasonable model for the probability
distribution of sample means.
- If scores in the sample are clearly normally distributed, it is safe to assume that
scores in the population are also normally distributed.
- (Student) t: used for one or two sample means, regression coefficients and correlation
coefficients
- F: used for comparison of variances and comparing means for three or more groups
(ANOVA)
- Chi-squared: used for analysis of categorical variables, frequency and contingency
tables.
Independent versus dependent/paired samples:
- Independent samples: when comparing two samples that are statistically independent
(meaning that the samples could have been drawn separately)
- The samples themselves are not affected by each other, and are therefore independent
of each other.
- For example, comparing red candies to yellow candies.
- Dependent samples: the composition of a sample depends partly or entirely on the
composition of another sample.
- Drawing a sample for a second measurement cannot happen independently of the first
measurement, if they wish to be compared.
- For example, comparing the color intensity of yellow candies (normal vs. faded).
Learning Objectives Answered
Q: What are the advantages and disadvantages of the three methods for constructing a
sampling distribution: bootstrapping, exact approach, and theoretical approximation?
- Bootstrapping:
Advantages:
- A sampling distribution can be conducted for any sample statistic.
Limitations:
- It is only correct if the original sample is more or less representative of the
population
- f.e. If the population is 20% yellow candies, the original sample and
bootstrapped sampling distribution must also be 20% yellow candies.