Chapter 7
7.1 Samples, Populations, and the Distribution of Sample Means
Whenever a score is selected from a population, you should be able to compute a z-score that
describes exactly where the score is located in the distribution. If the population is normal, you
also should be able to determine the probability value for obtaining any individual score.
In a normal distribution, for example, any score located in the tail of the distribution beyond
z= +2.00 is an extreme value, and a score this large has a probability of only p= 0.0228.
z-scores and probabilities that we have considered so far are limited to situations in which the
sample consists of a single score. Most research studies involve much larger samples such as
n=22 Amazing Race contestants or n=100 preschool children. In these situations, the sample
mean, rather than a single score, is used to answer questions about the population.
We transform a sample mean into a z-score.
As always, a z-score value near zero indicates a central, representative sample; a z-value beyond
+2.00 or –2.00 indicates an extreme sample. Thus, it is possible to describe how any specific
sample is related to all the other possible samples. In most situations, we also can use the z-score
value to find the probability of obtaining a specific sample, no matter how many scores the
sample contains.
In general, the difficulty of working with samples is that a sample provides an incomplete picture
of the population.
This difference, or error between sample statistics and the corresponding population parameters,
is called sampling error.
Definition: Sampling error is the natural discrepancy, or amount of error, between a sample
statistic and its corresponding population parameter.
If you take two separate samples from the same population, the samples will be different. They
will contain different individuals, they will have different scores, and they will have different
sample means.
The Distribution of Sample Means
The huge set of possible samples forms a relatively simple and orderly pattern that makes it
possible to predict the characteristics of a sample with some accuracy. The ability to predict
sample characteristics is based on the distribution of sample means.
Definition: The distribution of sample means is the collection of sample means for all the
possible random samples of a particular size (n) that can be obtained from a population.
Notice that the distribution of sample means contains all the possible samples. It is necessary to
have all the possible values to compute probabilities.
,Also, you should notice that the distribution of sample means is different from distributions
we have considered before. Until now we always have discussed distributions of scores; now the
values in the distribution are not scores, but statistics (sample means). Because statistics are
obtained from samples, a distribution of statistics is often referred to as a sampling distribution.
Definition: A sampling distribution is a distribution of statistics obtained by selecting all the
possible samples of a specific size from a population
Thus, the distribution of sample means is an example of a sampling distribution. In fact, it often
is called the sampling distribution of M.
If you actually wanted to construct the distribution of sample means, you would first select a
random sample of a specific size (n) from a population, calculate the sample mean, and place the
sample mean in a frequency distribution. Then you select another random sample with the same
number of scores. Again, you calculate the sample mean and add it to your distribution.
You continue selecting samples and calculating means, over and over, until you have the
complete set of all the possible random samples. At this point, your frequency distribution will
show the distribution of sample means.
Characteristics of the Distribution of Sample Means
1. The sample means should pile up around the population mean. Samples are not expected
to be perfect but they are representative of the population. As a result, most of the sample
means should be relatively close to the population mean.
2. The pile of sample means should tend to form a normal-shaped distribution. Logically,
most of the samples should have means close to 𝜇, and it should be relatively rare to find
sample means that are substantially different from 𝜇. As a result, the sample means
should pile up in the center of the distribution (around 𝜇) and the frequencies should taper
off as the distance between M and 𝜇 increases. This describes a normal-shaped
distribution.
3. In general, the larger the sample size, the closer the sample means should be to the
population mean, 𝜇. Logically, a large sample should be a better representative than a
small sample. Thus, the sample means obtained with a large sample size should cluster
relatively close to the population mean; the means obtained from small samples should be
more widely scattered.
As you will see, each of these three commonsense characteristics is an accurate description of the
distribution of sample means.
,7.2 Shape, Central Tendency, and Variability for the Distribution of
Sample Means
In more realistic circumstances, with larger populations and larger samples, the number of
possible samples increases dramatically and it is virtually impossible to actually obtain every
possible random sample
Fortunately, it is possible to determine exactly what the distribution of sample means looks like
without taking hundreds or thousands of samples. Specifically, a mathematical proposition
known as the central limit theorem provides a precise description of the distribution that would
be obtained if you selected every possible sample, calculated every sample mean, and
constructed the distribution of the sample mean. This important and useful theorem serves as a
cornerstone for much of inferential statistics. Following is the essence of the theorem.
Definition: Central limit theorem: For any population with mean μ and standard deviation 𝜎,
the distribution of sample means for sample size n will have a mean of μ and a standard
deviation of 𝜎 ∕ √𝑛 and will approach a normal distribution as n approaches infinity.
The value of this theorem comes from two simple facts. First, it describes the distribution of
sample means for any population, no matter what shape, mean, or standard deviation.
Second, the distribution of sample means “approaches” a normal distribution very rapidly. By the
time the sample size reaches n=30, the distribution is almost perfectly normal.
Note that the central limit theorem describes the distribution of sample means by identifying the
three basic characteristics that describe any distribution: shape, central tendency, and variability.
The Shape of the Distribution of Sample Means
In fact, this distribution is almost perfectly normal if either of the following two conditions is
satisfied:
1. The population from which the samples are selected is a normal distribution.
2. The number of scores (n) in each sample is relatively large, around 30 or more.
As n gets larger, the distribution of sample means will closely approximate a normal distribution.
When n>30, the distribution is almost normal regardless of the shape of the original population.
The Mean of the Distribution of Sample Means: The Expected Value of M
The average value of all the sample means is exactly equal to the value of the population mean.
This fact should be intuitively reasonable; the sample means are expected to be close to the
population mean, and they do tend to pile up around μ
The formal statement of this phenomenon is that the mean of the distribution of sample means
always is identical to the population mean. This mean value is called the expected value of M.
, In commonsense terms, a sample mean is “expected” to be near its population mean. When all of
the possible sample means are obtained, the average value is identical to μ.
The fact that the average value of M is equal to μ was first introduced in Chapter 4 (page 106) in
the context of biased versus unbiased statistics. The sample mean is an example of an unbiased
statistic, which means that on average the sample statistic produces a value that is exactly equal
to the corresponding population parameter. In this case, the average value of all the sample
means is exactly equal to μ.
Definition: The mean of the distribution of sample means is equal to the mean of the population
of scores, μ, and is called the expected value of M.
The Standard Error of M
So far, we have considered the shape and the central tendency of the distribution of sample
means. To completely describe this distribution, we need one more characteristic: variability.
The value we will be working with is the standard deviation for the distribution of sample means,
which is identified by the symbol 𝜎𝑀 and is called the standard error of M.
Serves 2 purposes:
First, the standard deviation describes the distribution by telling whether the individual scores are
clustered close together or scattered over a wide range. Second, the standard deviation measures
how well any individual score represents the population by providing a measure of how much
distance is reasonable to expect between a score and the population mean. The standard error
serves the same two purposes for the distribution of sample means.
1. The standard error describes the distribution of sample means. It provides a measure of
how much difference is expected from one sample to another. When the standard error is
small, all the sample means are close together and have similar values. If the standard
error is large, the sample means are scattered over a wide range and there are big
differences from one sample to another.
2. Standard error measures how well an individual sample mean represents the entire
distribution. Specifically, it provides a measure of how much distance is reasonable to
expect between a sample mean and the overall mean for the distribution of sample means.
However, because the overall mean is equal to μ, the standard error also provides a
measure of how much distance to expect between a sample mean (M) and the population
mean (μ).
The standard error measures exactly how much difference is expected on average between a
sample mean, M and the population mean, μ.
Definition: The standard deviation of the distribution of sample means, 𝜎𝑀 , is called the
standard error of M. The standard error provides a measure of how much distance is expected
on average between a sample mean (M) and the population mean (μ).