Statistical Modelling for Communication Research
Chapter 1 Sampling distribution: How different could my sample
have been?
Main concepts
Sample statistic: A number describing a characteristic of a sample
Sampling space: All possible sample statistic values
Sampling distribution: All possible sample statistic values and their
probabilities or probability densities
Probability density: A means of getting the probability that a
continuous random variable (like a sample statistic) falls within a
particular range
Random variable: A variable with values that depend on change
Expected value/expectation: The mean of a probability distribution,
such as a sampling distribution
Unbiased estimator: A sample statistic for which the expected value
equals the population value
1.1Statistical inference: Making the most of your data
Checking statements requires data covering all situations addressed by
theory. We would like to collect as little as possible and still draw
conclusions. With inferential statistics we want to generalize a statement
about a random sample to a statement about the population.
1.2A discrete random variable: How many yellow candies in my bag?
Different random samples from one population do not have to be
identical, they can be. A sample statistic is a value describing a
characteristic of the sample, for example the number of yellow candies in
a bag. All possible outcome score constitutes the sampling space. The
sample statistic is a random variable. The score depends on the chance
that a particular sample is drawn.
We call the distribution of the outcome scores of very many samples
a sampling distribution.
The probability is the proportion of all possible samples that we
could have drawn, this is displayed in the sampling distribution. To get the
probability, we divide the number of corresponding samples by the total
number of samples we have drawn. If we change frequencies into
proportions, we obtain the probability distribution of the sample statistic.
If only a limited number of outcomes are possible, it is called a discrete
probability distribution. A probability is a proportion, a number between 0
and 1, or a percentage, between 0% and 100%.
Proportion times the total number of candies in the sample is the
expected value. The expected value equals the mean of the sampling
distribution, the probability distribution.
The sample proportion is an unbiased estimator of the proportion in
the population. A sample statistic is called an unbiased estimator or the
population statistic if the expected value is equal to the population
statistic. The population statistic is a parameter.
If we were to estimate the number in the population from the
number in the sample, we vastly underestimate, downward biased, it is
1
,too low. We don’t use the number of yellow candies to generalize from our
sample to our population, we use proportions.
A sample is representative of a population if variables in the sample
are distributed in the same way as in the population. A random sample is
likely to differ from the population, but we should expect it to be
representative, it is in principle representative.
1.3A continuous random variable: Overweight and underweight
If we can always think of a new value in between two values, the
variable is continuous, for example weight.
If we are interested in in the average weight of all candies in our
sample bag, this is our key sample statistic.
We have an infinite number of possible outcomes with continuous
sample statistic. The probability of continuous sample statistic is for all
practical purposes zero and negligible.
We must look at a range of values instead of a single value. You
choose a threshold. The probability is an area between the horizontal axis
and a curve. This curve is called a probability density function.
A probability density function can give the probability of values
between two thresholds, it can be a left-hand probability, a right-hand
probability or neither.
A probability always adds up to one.
1.4Concluding remarks
In a sampling distribution we observe samples (cases) and measure
a sample statistic as the (random) variable.
The sampling distribution collects many sample proportions. The
mean (expected value) equals the proportion in the population, because a
sample proportion is an unbiased estimator of the population proportion.
We have means at three levels: the population, the sampling distribution,
and the sample.
The mean of the sampling distribution is the average of the average
weight of candies in every possible sample bag. This is the same mean as
the average weight of the candies in the population. The sampling
distribution is connected to the population because the parameter is equal
to the mean of the sampling distribution. The sampling distribution is
linked to the sample because it tells us which sample means we will find
with what probabilities.
Chapter 2 Probability models: How do I get a sampling
distribution?
Main concepts
Bootstrapping: sampling with replacement from the original sample
to create a sampling distribution
Exact approach: calculating the true sampling distribution as the
probabilities of combinations of values on categorical variables
Theoretical approximation: using a theoretical probability
distribution as an approximation of the sampling distribution
2
, Independent samples: samples that can in principle be drawn
separately
Dependent/paired samples: the composition of a sample depends
partly or entirely on the composition of another sample
2.1 The bootstrap approximation of the sampling distribution
With bootstrapping we only draw one sample from the population.
We draw many samples from our initial sample, which are called bootstrap
samples. We want about 5000 bootstrap samples for our sampling
distribution.
To construct a sampling distribution from bootstrap samples, the
bootstrap samples must be exactly as large as the original sample.
Bootstrap samples are sampled with replacement from the original
sample, so they can differ.
In practice, we always sample without replacement, but our
statistical software calculates probabilities as if we sample with
replacement, only if the population is much larger than the sample.
At larger sample sizes, the bootstrapped sampling distribution is
more like the true sampling distribution. The original sample that has
drawn from the population must be representative of the population.
Otherwise, the sampling distribution may give a distorted view of the true
sampling distribution.
Bootstrapping can get a sampling distribution for any sample
statistic.
2.2 Bootstrapping is SPSS
Analyze > Compare means > Independent samples t test > Select
variable as test variable > Select a grouping variable > Define groups >
Bootstrap > Check perform bootstrapping > Number of samples (5000) >
Set confidence level > Check bias corrected accelerated > Paste > Run.
Measures of central tendency can be obtained: Analyze > Descriptive
statistics > Frequencies > Statistics.
2.3 Exact approaches to the sampling distribution
If we know or think we know the proportion of yellow candies in the
population, we can exactly calculate the probability that a sample of ten
candies includes one, two, three or ten yellow candies. Calculated
probabilities of all possible sample statistic outcomes give us an exact
approach to the sampling distribution.
An exact approach list can only be made if we work with discrete or
categorical variables. Exact approaches are also available for the
association between two categorical variables in a contingency table.
Exact approaches are computer intensive. It is wise to set a limit to
the time you allow your computer to work on an exact sampling
distribution.
2.4 Exact approaches in SPSS
3