Summary Statistics 2
Chapters 16 – 19 (MIDTERM)
PART 4: INFERENTIAL STATISTICS
Studies in economics are often about large populations and unknown population statistics (parameters) concerning one or more
variables. Observing the whole population is not an option. Inferential statistics draws conclusions about the whole population
by studying samples drawn from that population.
Chapter 16: Confidence intervals and tests for µ and p
-The purpose of this chapter is to develop standard interval estimators and standard test procedures for population means µ (if
variable x is quantitative) and population proportion p (if variable x is qualitative). They are combined since for large sample
sized the statistical procedures for both parameters lean on the Central Limit Theorem, and the formats for their interval
estimators and tests statistics are similar.
-When the inference is about a population mean, then it will be assumed that the accompanying population variance ² is
unknown (in contradiction to chapter 15)!
-The procedures are based on the respective estimators ̅ and ̂
§16.1 - Standardized sample mean and t-distribution (= µ)
̅
-Standardized sample mean (in the case ² is known): Z= ˜ N(0,1) (standard deviation of ̅)
√
-The reasons why this standardized sample mean is so important:
1. If the sample is large, Z is approximately standard normally distributed (easy calculations)
2. If the sample is drawn from a normal distribution, Z is exactly standard normally distributed (easy calculations)
3. It is the starting point for the creation of interval estimators and test statistics for µ
-However, in this chapter we assumed that ² is unknown, which changes the standardized sample mean Z into a new random
̅
variable = adapted standardized version: T = ˜ Tn-1 (standard error of ̅)
√
= more variation (estimator); new probability distribution namely the t-distribution.
-The basic results for inferential statistics on µ; is if the random sample is drawn from N(µ, ²) you get Z or T. Note that they are
both ‘herleid’ from the normal distribution.
-Property of the family of t-distributions: Tv ≈ N(0,1) as v gets large whereby v = n -1 which is the number of degrees of
freedom (n-1 gives more precise measures than n). Note that S becomes closer to ² when n gets large. So for normal random
samples, the probability distribution of Z and T are approximately equal for large n.
-Graph (pdf) looks like standard normal graph (pdf) since it is also symmetric around 0 but the tails are fatter and is less high.
-Considering these graphs and pdf’s; alpha is now entering the scene: the number “tα;n-1” cuts off an area α in the right-hand
tail (and an area 1 - α in the left-hand tail) P (T ≥ t α;n-1) = α (and P (T ≤ - t α;n-1) = α symmetry)
EXCEL
-Z is always calculated with respect to the left-hand tail Area is given(Δ): z0,03 ? = NORM. S. INV(0,97)
Number is given(Δ): Z >2.1 ? =
1) 1 - NORM. S. VERD (2,1;1)
2) NORM. S. VERD (-2,1;1) since symmetry around 0
-T should only be adjusted with left-hand tail for given # Area is given: t0,02;9 ? = TINV (0,04;9)
Number is given(Δ): T > 2.1 ? =
1) 1- T. DIST (2,1;9;1)
2) T. DIST (-2,1;9;1) since symmetry around 0
-Note: do not get confused by P (T > “t α;n-1”) = α and the notation of an area by z0,03 or t0,02;9 (“t α;n-1” = just a # / quantile)
, -So P (-tα/2;n-1 < T < tα/2;n-1) = 1 – α follows as a result which leads to confidence intervals in the next section because
rewriting with test statistics leads to P(̅-tα/2;n-1 √ < µ < ̅ + tα/2;n-1 √ )=1-α
§16.2 Confidence intervals and tests for µ
-Doing inferential statistics of µ, at least one of the following situations is assumed to be valid:
1. The normal distribution N(µ, ²) is a good model for the variable X
2. The sample size n is large (Central Limit Theorem) approximately standard normally distributed N(0,1).
These resulting statistical procedures will turn out to be the same!
-Note: for inferential statistics about µ you use the t-distribution since ² is unknown.
• 1- α confidence intervals: the following two random bounds capture the unknown µ with probability 1 – α
Interval estimator for µ when ² is unknown: ̅ +/- tα/2;n-1 √
L = ̅ - tα/2;n-1 √ U = ̅ + tα/2;n-1 √
-Half width = Tα/2;n-1 √ =h
• Hypothesis tests: 5- step procedure for testing H0 against H1
(i) Testing problem with the hinge µ0 + and α (note: only H0 can obtain ≤/≥/=)
̅
(ii) Test statistic adopted for worst-case scenario (hinge): T=
√
(iii) Reject H0 when: a) t ≥ tα; n-1 =one-sided, upper tailed
b) t ≤ - tα; n-1 =one-sided, lower tailed
c) t ≤ - tα/2; n-1 or t ≥ tα/2; n-1 =two-sided
(note: use always ≤ and ≥ )
(iv) The val (realization of T)
(v) The conclusion
-Hypothesis testing can equivalently be conducted with the p-value method (the smallest significance level that would have
rejected H0), see the following (step 1 and 2 are the same);
(iii) The val (realization of T)
(iv) The p-value a) p – value = P(T ≥ val) =one-sided, upper tailed
b) p – value = P (T≤ val) =one-sided, lower tailed
c) p – value = P (l T l ≥ l val l )= 2 x P (T ≥ l val l ) =two-sided
(note: use always ≤ and ≥ )
(v) The conclusion; in comparison with α reject H0 when p-value ≤ α in all circumstances (a, b and c).
-When nothing is said about ² you may assume that it is unknown.
-Variables that measure income or expenses are not normally distributed by their selves; need large n!
-You can have differences between the conclusion of a confidence interval and a hypothesis test.
-Create a (1- α) confidence interval with half-width ‘H’: = sample size n is unknown and needs to be calculated, while the H and
the rest is known (you need to know n to obtain Tα/2;n-1, but they assume it will be equal to Zα which ís known = large sample
size n + s can be used in the former example) ‘H’ = Tα/2;n-1 √
= with widh .. + with a precision of .. + to within .. + estimation error is at most …
Note that you have not really to do anything with the confidence interval part; just calculate n! they talk about 1- α
confidence intervals since the half-width formula is obtained from the interval estimator + you know α then.
§16.3 – Confidence intervals and tests for p: large sample approach
-We only consider situation 2 in the case of the proportion p: “the sample size n is large (Central Limit Theorem)
approximately standard normally distributed N(0,1)”. This means that the sample has to meet the requirements of the (adapted)
5-rule = np ≥ 5 and n(1-p) ≥ 5 for confidence intervals you use ̂ and for tests you use P0.
-The parameter of interest is the unknown proportion of successes (#1) in the population (failures = 0). Let Y be the number of
successes in the sample and ̂ the sample proportion of successes in the sample; ̂ = Y/n (=estimator p)
-Note: for inferential statistics about p you use a form of the z- distribution (because you don’t have any ²):
̂ ̂ ̂ ̂ ̂
Z= ≈ N(0,1) and P(̂ – zα/2 √ < p < ̂ + zα/2 √ ) ≈ 1- α which is useful for CI’s.
√
• 1- α confidence intervals: the following two random bounds capture the unknown p with ≈ probability 1 – α
̂ ̂
Interval estimator for p if the 5-rule is valid: ̂ +/– zα/2 √