PROBABILITY AND SAMPLING
The Logic of Sampling
Researchers are usually interested in making general statements about something. For
example, we might be interested in investigating the effects of remote teaching on student
achievement. Ideally, we would like to conclude all students within clearly specified
parameters (or distinguishing characteristics). These parameters are what
define our population of interest.
In social science research, it is logistically impractical for a researcher to approach all the
individuals\units in a population for data collection. Instead, we select a representative
group of individuals/units from our population of interest, collect the data we need
and then generalise the findings derived from our sample to the entire population.
A population is the entire collection of units about which we would like to draw
conclusions
A sample represents the units that are measured
Each individual or case that constitutes a sample is called a sampling unit (or
element).
Although any subset of a given population can be considered a sample, we need to ensure
that it is representative of the population. In other words, our sample should be a small-
scale replica of the population for us to draw valid conclusions about the population.
The process of making conclusions about a population based on findings derived from a
sample is called statistical inference.
To understand how statistical inference works, you need to know about the sampling
distribution of the mean and the central limit theorem.
The Sampling Distribution of the Mean
,Typically, we conduct a study only once and collect data from a single sample. Now, imagine
replicating our study many times and drawing multiple samples (of the same size) from our
population of interest. We could calculate the mean for each of these samples and plot
them on a histogram. The histogram would display what statisticians refer to as the
sampling distribution of the mean.
Fortunately, we do not have to do replicate studies to estimate the sampling distribution of
the mean. Statistical procedures can estimate this distribution from a single random sample.
It would help if you kept in mind that the shape of the sampling distribution depends on
the sample size. And, this brings us to the Central Limit Theorem.
Central Limit Theorem
The Central Limit Theorem states that the sampling distribution of the sample
means approaches a normal distribution as the sample size gets larger — no matter what
the shape of the population distribution is. What does this mean for a researcher? As our
sample size increases, the sample mean and standard deviation will be closer in value to the
population’s mean and standard deviation (we aim to draw inferences about the
population).
Sampling error
Sampling error is the difference between a population parameter and a sample statistic
used to estimate it. We aim to minimise sampling error.
Sample Size and Sampling Error
Suppose two studies differ only in sample size (here, we assume that the population of
interest and the sampling method are the same). In that case, the study with a larger sample
size will have less sampling error than the one with the smaller sample size. Keep in mind
that as the sample size increases, it approaches the size of the entire population and the
characteristics of the population, thus, decreasing sampling error.
, How big should our sample be, you might ask?
Unfortunately, this straightforward question does not come with a straightforward answer.
A full treatise of this matter is beyond the scope of this course. But here are a few things to
consider.
When choosing a sample, we are concerned with two things:
1. Will the sample be representative of the population?
2. Will the sample be precise enough?
An unrepresentative sample will result in biased conclusions. Your sampling method is the
first thing to get right. The second issue is precision. The larger the sample, the smaller the
margin of uncertainty (confidence interval) around the results. Another factor that affects
precision is the variability of the thing being measured. The more something varies from
person to person, the bigger the sample you need to achieve the same degree of certainty
about your results. In other words, the more heterogeneous (dissimilar) the population, the
more sampling units needed.
Note: Sampling error can include both systematic sampling error (sampling bias) and
random sampling error. Random sampling error arises simply because a sample is an
imperfect representation of the population of interest. No matter how carefully you select
your sample or how big your sample size is, there will always be a percentage of error. This
is unavoidable as long as you do not collect data from every member of the population.
The numerical index of sampling error is called the standard error of the mean.
Hint: A random sample does not always imply a representative sample.
The Sampling Process
The figure below illustrates the different steps involved in the sampling process.