Statistics for Business Economics II
I. Ch. 9: Sampling Distributions and Confidence Intervals for Proportions
Learning Objectives:
Model the variation in statistics from sample to sample with a sampling distribution:
• The sampling distribution of the sample proportion is Normal as long as the sample
size is large enough.
Understand that, usually, the mean of a sampling distribution is the value of the parameter
estimated:
• For the sampling distribution of 𝑝̂ , the mean is p.
Interpret the standard deviation of a sampling distribution:
• The standard deviation of a sampling model is the most important information about
it.
!"
• The standard deviation of the sampling distribution of a proportion is # # where q =
1 - p.
Construct a confidence interval for a proportion, p, as the statistic, 𝑝̂ , plus and minus a
margin of error:
• The margin of error consists of a critical value based on the sampling model times a
standard error based on the sample.
• The critical value is found from the Normal model.
!$"$
• The standard error of a sample proportion is calculated as # # .
Interpret a confidence interval correctly:
• You can claim to have the specified level of confidence that the interval you have
computed actually covers the true value.
• Best way to interpret one: “We are 95% confident that between 40.4% and 43.6% of
U.S. adults thought the economy was improving.”
Understand the importance of the sample size, n, in improving both the certainty (confidence
level) and precision (margin of error):
• For the same sample size and proportion, more certainty requires less precision and
more precision requires less certainty.
Know and check the assumptions and conditions for finding and interpreting confidence
intervals:
• Independence Assumption/Randomization Condition
• 10% Condition
• Success/Failure Condition
1
,Independence Assumption: The sampled values must be independent of each other.
Sample Size Assumption: The sample size, n, must be large enough.
Randomization Condition: If your data come from an experiment, subjects should have been
randomly assigned to treatments. If you have a survey, your sample should be a simple
random sample of the population. If some other sampling design was used, be sure the
sampling method was not biased and that the data are representative of the population.
10% Condition: If sampling has not been made with replacement (that is, returning each
sampled individual to the population before drawing the next individual), then the sample
size, n, should be no larger than 10% of the population. If it is, you must adjust the size of the
confidence interval with methods more advanced than those found in this book.
ð You always check the 10% condition from 𝑝̂ .
Success/Failure Condition: The Success/Failure condition says that the sample size must be
big enough so that both the number of “successes,” np, and the number of “failures,” nq, are
expected to be at least 10.
Be able to invert the calculation of the margin of error to find the sample size required, given
a proportion, a confidence level, and a desired margin of error:
Example: We want to estimate the proportion of customers who are likely to purchase this
new service to within 3% with 95% confidence. How large a sample do they need?
𝑝̂ 𝑞+
𝑀𝐸 = 𝑧 ∗ )
𝑛
!$"$
ó 0.03 = 1.96 ∗ # #
(&.()∗(&.()
ó 0.03 = 1.96 ∗ # #
ó 0.03 √𝑛 = 1.96 ∗ 5(0.5) ∗ (0.5)
+.,-∗.(&.()∗(&.()
ó √𝑛 = &.&/
ó 𝑛 ≈ (32.67)0 ≈ 1067.1
2
,Terms:
Confidence interval:
An interval of values usually of the form
estimate ± margin of error
found from data in such a way that a particular percentage of all random samples can be
expected to yield intervals that capture the true parameter value.
Critical value:
The number of standard errors to move away from the estimate (mean of the sampling
distribution) to correspond to the specified level of confidence. The critical value, denoted z*,
is usually found from a table or with technology.
Margin of error (ME):
In a confidence interval, the extent of the interval on either side of the estimate (the
observed statistic value). A margin of error is typically the product of a critical value from the
sampling distribution and a standard error from the data. A small margin of error
corresponds to a confidence interval that pins down the parameter precisely. A large margin
of error corresponds to a confidence interval that gives relatively little information about the
estimated parameter.
One-proportion z-interval:
A confidence interval for the true value of a proportion. The confidence interval is
𝑝̂ ± 𝑧 ∗ 𝑆𝐸(𝑝̂ )
where z* is a critical value from the Standard Normal model corresponding to the specified
!$"$
confidence level and 𝑆𝐸(𝑝̂ ) = # #
Sampling distribution:
The distribution of a statistic over many independent samples of the same size from the
same population.
Sampling distribution model for a proportion:
If the independence assumption and randomization condition are met and we expect at least
10 successes and 10 failures, then the sampling distribution of a proportion is well modelled
by a Normal model with a mean equal to the true proportion value, p, and a standard
!"
deviation equal to # # .
Sampling error/Sampling variability:
The variability we expect to see from sample to sample is often called the sampling error,
although sampling variability is a better term.
Standard error (SE):
When the standard deviation of the sampling distribution of a statistic is estimated from the
data, the resulting statistic is called a standard error (SE).
3
, Exercise Examples:
1. The proportion of adult women in Latvia is approximately 54%. A marketing survey
telephones 400 people at random.
a) What is the sampling distribution of the observed proportion that are women?
b) What is the standard deviation of that proportion?
c) Would you be surprised to find 56% women in a sample of size 400? Explain.
d) Would you be surprised to find 51% women in a sample of size 400? Explain.
e) Would you be surprised to find that there were fewer than 180 women in the sample?
a) Normal Sampling Distribution
!" (&.(1)∗(&.1-)
b) SD = #
#
=> # 1&&
= 0.025
!#$ &.()#&.(*
c) z* =
%
=> &.&+(
= 0.8 è Less than 1 z-score away so not surprising.
ð 𝑝 ± 𝑧 ∗ 𝑆𝐷(𝑝) => 0.54 ± 0.8 * 0.025
&.(,#&.(*
d) z* = = -1.2 è More than 1 z-score away so would be surprising.
&.&+(
e) X = 180/400 = 0.45
&.*(#&.(*
z* = = -3.6 è Very surprising because it is more than 3 SE’s away.
&.&+(
2. Based on past experience, a bank believes that 7% of the people who receive loans will not
make payments on time. The bank has recently approved 200 loans.
a) What are the mean and standard deviation of the proportion of clients in this group who
may not make timely payments?
b) What assumptions underlie your model? Are the conditions met? Explain.
c) What’s the probability that over 10% of these clients will not make timely payments?
• n = 200
• p = 0.07 => So, μ (mean) = 0.07
!" (&.&2)(&.,/)
a) 𝜎(𝑝) = # => #
# 0&&
» 1.8 is the standard deviation
b) Independence Assumption, 10% Assumption, Success/Failure Condition, they are met.
!$34!
" &.+3&.&2
c) 𝑧 ∗ = !#
=> » 1.663
5 5(&.&()(&.*+)
$ ,&&
ð Probability: 0.048 (via calculator)
4