Sampling distribution
Statistics The use of data in the context of uncertainty, a branch of mathematics using probability theory.
Bernoulli trial 𝑋~𝐵𝑒𝑟𝑛(𝜋) Binomial trial 𝑋~𝐵𝑖𝑛(𝑛, 𝜋)
A random experiment with exactly 2 A repetition of the Bernoulli trial. P of k successes in n repetitions:
outcomes (binary variables): “success”
𝑛! 𝑛
[P(X=1)=π], and “failure” [P(X=0)=1-π]. 𝑃(𝑋 = 𝑘) = 𝜋 𝑘 (1 − 𝜋)𝑛−𝑘 = ( ) 𝜋 𝑘 (1 − 𝜋)𝑛−𝑘
𝑘!(𝑛−𝑘)! 𝑘
Hypergeometric trial 𝑋~ℎ𝑦𝑝𝑒𝑟𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐(𝑁, 𝐾, 𝑛) 𝐾 𝑁−𝐾
( )( )
Calculate the probability of drawing k elements of the K items in a set n with a certain 𝑃(𝑋 = 𝑘) = 𝑘 𝑛 − 𝑘
𝑁
( )
feature without replacement: 𝑛
Estimator An approximation of a population parameter that uses observed data (statistics).
µ → µ̂/𝑥̅ 𝜎 2 → 𝜎̂ 2 /𝑠 2
The population parameter is often denoted using 𝜃, the sample estimate is denoted using 𝜃̂
Normal distribution 𝑋~𝑁(µ, 𝜎 2 ) Sampling distribution If 𝑋~𝑁(µ, 𝜎 2 ),
2
A distribution with the shape of a Probability distribution of the sample statistic. The 𝜎
then 𝑥̅ ~𝑁(µ, )
𝑛
bell-curve. Usually the model statistic is the random variable in the distribution.
parameters need to be estimates, as The sd 𝜎 of a sample statistic is the same as the Standard error.
the population model is unknown. It expresses the uncertainty about the statistic.
Bias, Variance, MSE
A statistic is unbiased if the mean of the sampling distribution coincides with the population parameter.
A statistic has low variance if the deviation from the mean is very small.
2 2 2
𝑩𝒊𝒂𝒔𝟐 = (𝜃 − 𝐸(𝜃̂)) 𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆 = 𝐸 ((𝐸(𝜃̂) − 𝜃̂) ) 𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓 = √𝑣𝑎𝑟 = √𝐸 ((𝐸(𝜃̂) − 𝜃̂) )
2
̂ ) = 𝐸(𝜃 − 𝜃̂) = 𝑏𝑖𝑎𝑠 2 + 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑴𝒆𝒂𝒏 𝒔𝒒𝒖𝒂𝒓𝒆𝒅 𝒆𝒓𝒓𝒐𝒓 𝒐𝒇 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒊𝒐𝒏 = 𝑴𝑺𝑬(𝜽
Efficient estimators have a low MSE. It’s a difficult situation to minimize, as it results in a Bias-Variance Tradeoff. A
lower variance is better, but the Bias is ideally equal to 0. When comparing statistics, this MSE is used most often.
The variance is the variance regarding the sample mean, the MSE is the variance regarding the population mean.
Sample mean of 𝑋~𝑁(µ, 𝜎 2 )
𝜽 ̂
𝜽 Sampling distribution Bias Variance MSE
µ 1
𝑥̅ = ∑𝑛𝑖=1 𝑥𝑖 𝑥̅ ~𝑁(µ,
𝜎2
) 𝐸(𝑥̅ ) = µ 𝜎2 𝜎2
𝑛 𝑛 Bias = 0 𝑛 𝑏𝑖𝑎𝑠 2 + 𝑣𝑎𝑟 =
𝑛
Sample mean of X~N(µ, ?) (unknown variance)
𝜽 ̂
𝜽 Sampling distribution Bias Variance MSE
µ 𝑥̅ 𝑡 − 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝐸(𝑥̅ ) = µ 𝑠2 𝑠2
𝑥̅ −µ Bias = 0 𝑛 𝑏𝑖𝑎𝑠 2 + 𝑣𝑎𝑟 =
~𝑡𝑛−1 1 𝑛
𝑠/√𝑛 𝑠2 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑛−1
Student’s t distribution 𝑡~𝑡𝑛−1
𝑥̅ −µ
T is the t-statistic 𝑡 = .
𝑠/√𝑛
This is the sample distribution when the population distribution is Normal, and the variance is unknown.
𝑛 − 1 signifies the degrees of freedom, the higher the degrees of freedom, the closer it gets to ~N().
Maximum Likelihood estimator Unbiased estimator
1 1
𝜎 2 = ∑(𝑥𝑖 − 𝑥̅ )2 𝜎2 = ∑(𝑥𝑖 − 𝑥̅ )2
𝑛 𝑛−1
When using the MSE to analyze the estimators, the Maximum Likelihood estimator of smaller samples is more
efficient.
, If you have 2 samples from normally distributed data: 𝑋1 ~𝑁(µ1 , 𝜎 2 ) and 𝑋2 ~𝑁(µ2 , 𝜎 2 )
The sampling distribution of the difference in sample means is a t distribution with 𝑛1 + 𝑛2 − 2 degrees of freedom,
centered at µ2 − µ1 .
sampling distribution of the difference in sample means 𝑡~𝑡𝑛1+𝑛2 −2
This is centered at µ2 − µ1
The Standard error is calculated using: 𝑆𝐸(𝑥̅2 − 𝑥̅1 ) = 𝑠𝑝 √
1
+
1 (𝑛1 −1)𝑠12 +(𝑛2 −1)𝑠22
, 𝑠𝑝 = √
𝑥̅2 − 𝑥̅1 𝑛1 𝑛2 𝑛1 +𝑛2 −2
𝑡=
𝑆𝐸(𝑥̅2 − 𝑥̅2 )
Central Limit Theorem
If n is large enough, the sample mean of X coming from 𝑋~? (µ, 𝜎 2 ) with mean µ and variance
𝜎2
𝜎 2 is approximately the normal distribution 𝑥̅ ~𝑁(µ, )
𝑛
Monte Carlo Simulation
Computer simulation A numerical technique for conducting experiments on the computer. A tool to virtually
investigate the behavior of the system
Monte Carlo Simulation Computer experiment involving random sampling from probability distributions.
Used for estimators and for hypothesis testing (in absence of analytical results)
MC simulations for estimators
An estimator or test statistic has a true sampling distribution under a particular set of conditions. We want to know
this distribution. The derivation is however not always tractable. The MC simulation can be used to approximate the
distribution.
Step 1: Create approximate sampling distribution
Generate S independent data sets of given sample size n under the conditions of interest
Compute the numerical value of the estimator/test statistic 𝜃̂ for each dataset.
Step 2: Derive bias, var, MSE, relative efficiency
If S is large enough, the summary statistics should be a good approximation to the true sampling properties
̂ (1) )
𝑀𝑆𝐸(𝜃
Relative efficiency 𝑅𝐸 = ̂ (2) )
1 if 𝑅𝐸 < 1, estimator 1 is preferred
𝑀𝑆𝐸(𝜃
The sample median is most efficient for distributions with thick tails.
If the distribution is more similar to a normal distribution the mean is more useful.
MC simulations for hypothesis testing
t-statistic:
There are two types of hypothesis testing situations: 𝑥̅ −𝑥̅
1) Randomness (𝐻0 ) vs. Non-randomness (𝐻1 ) of data 𝑡𝑜𝑏𝑠 = 2̅ 1̅
𝑆𝐸(𝑋2 −𝑋1 )
2) No effect (𝐻0 ) vs. Effect (𝐻1 )
𝐻0 is rejected if the observed data/statistics are very unlikely under the assumption of randomness and no effect.
Confidence intervals
Confidence intervals This expresses sampling uncertainty. Often this is mentioned Two sided t-confidence:
instead of the point estimate. [(𝑥̅ 2 − 𝑥̅1 ) − 𝑡𝐶;𝑛1+𝑛2 −2 𝑆𝐸(𝑥̅2 − 𝑥̅1 );
It holds the true population parameter 𝜃 with a probability of C. (𝑥̅2 − 𝑥̅1 ) + 𝑡𝐶;𝑛1+𝑛2−2 𝑆𝐸(𝑥̅2 − 𝑥̅1 )]
A two-sample Student’s t-test does rely on some assumptions: the samples must come
from a normal distribution, and the variances are equal. If these are violated it can impact the quality of the
hypothesis test. If the variances are not equal, Welch’s test applies.
Power of a test complement of Type II error Significance level Type I error
𝑃(𝑡𝑒𝑠𝑡 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑟𝑒𝑗𝑒𝑐𝑡𝑠 𝐻0 | 𝐻1 𝑡𝑟𝑢𝑒) = 1 − 𝛽 𝑃( 𝑡𝑒𝑠𝑡 𝑟𝑒𝑗𝑒𝑐𝑡𝑠 𝐻0 ∣ 𝐻0 𝑡𝑟𝑢𝑒 ) = 𝛼
The probability of correctly rejecting 𝐻0
generate data under 𝐻0 : µ = µ0
Generate data under 𝐻1 : µ ≠ µ0 calculate how often 𝐻0 is rejected, this approximates 𝛼.
calculate the proportion of rejections.
1 Compare two estimators, e.g. 𝜃̂ (1) is the mean and 𝜃̂ (2) is the median