These notes capture the key concepts, discussions, and important information from the class sessions. They are intended to provide a comprehensive summary of the material covered, including lecture highlights, significant topics, and any additional insights provided by the instructor.
Chapter 21
Estimating
● Statistical Inference
○ Drawing conclusions about a population from sample data
● One kind of conclusion answers questions such as the following
○ “What percent of employed women have a college degree?”
○ “What is the mean survival time for patients with a given type of cancer?”
■ Can’t feasibly study all individuals in the population , so take a sample Z
● Parameters
○ Numbers that describe a population
○ To estimate a population parameter, use a statistic, a number calculated from the sample, as your
estimate
● Does it make sense to use this statistic to represent the population?
○ Yes - it’s our best guess
● But, still only an estimate
● If you’ve done everything right, then
○ The sample statistic ≈ the true population proportion
● Since it’s an estimate, how can we reflect the degree of our uncertainty about this?
● C% Confidence Interval
○ A range of value calculated from sample data by a process that’s designed to capture the true
population parameter in C% of all samples
■ Assuming you repeat sampling ot infinity
○ C is often 95 (as in 95% confidence interval)
Estimating with Confidence
● Goal
○ Estimate the proportion p of the individuals in a population who have some characteristics (say,
“success”)
● p̂
○ Proportion of that characteristic (“success”) in a simple random sample (SRS)
● How accurate is the statistic p̂ as an estimate of the parameter p?
● Frequentists will think to themselves
○ “What if we sampled and generated p̂ an infinite number of times?”
● Variance of p̂
○ Clear pattern in the long run
○ Described by a normal curve
● Sampling distribution
○ The distribution of values taken by a statistic in
■ All possible samples
■ Of the same size (n)
■ From the same population
● P-hat
○ The sample proportion of successes
● If the sample size is large enough
○ The sampling distribution of p̂ is approximately normal
○ The mean of the sampling distribution is p
○ The standard deviation of the sampling distribution:
, ● If we’re sampling many, many times from the same population and samples are the same size
○ sampling distribution of the statistic has an approximate normal shape
○ The mean of sampling distribution of the statistic equals the true population parameter
○ The standard deviation of the sampling distribution is predictable
○ The variance of the statistic follows a regular pattern
■ Conceptually, this combines the knowledge of probability and frequentism with sampling
and inference
● Standard Error
○ The technical name we give for the standard deviation of the
sampling distribution of a sample statistic
○ What the statistic is a proportion, the formula is:
● Using the 68-95-99.7 rule, for any value of p
○ When the population proportion has the value p, 95% of all
samples catch p in the interval extending roughly 2 standard
errors on either side of p
○ Per the 68-95-99.7 rule, 95% of the distribution’s values fall
within 2 standard deviations on either side of the mean
○ The standard error is the standard deviation of the sampling distribution
■ Therefore ~95% of samples (the values in the sampling distribution) encompass the value
of the parameter (the mean of the sampling distribution), in the range of values given by
2 standard errors (the standard deviation of the sampling distribution)
● What’s wrong with what we just did?
○ Most of the time, we don’t actually know what p is
○ No way to verify exactly without actually sampling everyone
○ Solution: substitute p̂ from a sample to compute interval
● The confidence interval
○ From a SRS of size n from a large population contains an unknown proportion of p of successes
○ Call the proportion of successes in this sample p̂
● An approximate 95% confidence interval for the parameter p is p̂ ± 2 √ ❑
Understanding Confidence Intervals
● Confidence intervals are frequentist in nature
○ “What would happen if we repeated the sampling many, many times?”
● The 95% in a 95% confidence interval is a probability
○ The probability that the method produces an interval that captures the true parameter over 95 of
repeated samples
● The 95% refers to your confidence in the interval itself, not to any point estimate inside of itt
○ By this we mean, the probability that any actual number lies within a CI you’ve found is 1 or 0
(i.e., it does or it doesn’t)
■ Ex: CI [92, 95], probability that 4 falls in this CI is 100% or 1
● CIs aren’t exacts for two reasons
○ 1) the sampling distribution of the sample proportion p̂
■ Only approximately normal
○ 2) the standard error of p
■ Not exact because we used p̂ instead
○ This means we use
■ A new estimate of the standard deviation of the sampling distribution every time we take
a new sample, even though the true standard deviation never changes
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller ava5. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $15.49. You're not tied to anything after your purchase.