Statistics
1. Introduction to statistics and sampling distributions
What is statistics? “Statistics is the science of learning from data and of measuring, controlling,
and communicating uncertainty” American Statistical Association (ASA).
Section 1: Introduction examples to illustrate statistical inference
EXAMPLE Suppose that we would like to know the average spending yesterday by the rst year
undergraduate students in the UK universities. We may decide to collect the information from all
the UK rst year undergraduate students. Some di culties we encounter may be that:
• it will be extremely costly to collect this information from all such students, and
• some errors can be made by data collectors and collectees.
1.1 Estimating the population mean
One feasible strategy would be to select a small number of representative students then compete
the average spending of yesterday. By doing so, the cost of collecting data becomes a ordable
and the chance of making a mistake in recording will become substantially lower.
We call all of the rst year undergraduate students in UK university the population. The group of
selected representative students are called a sample.
The next important question is which students should we select? We should select students in
such a way that they do not under/over represent the population, systematically. For example,
selecting some/all students in a lecture theatre can misrepresent this population, systematically.
Suppose we have collected the following data: John = £15, Mark = £5, Jessica = £30, Fred = £25,
Jo = £45. The sample mean is (15 + 5 + 30 + 25 + 45) / 5 = £24.
Now suppose we have picked up another sample of size 5, which gives another data set: Mary =
£10, Peter = £35, Monica = £25, Paul = £55, Vicky = £15. The sample mean is (10 + 35 + 25 + 55
+ 15) / 5 = £28. As can be seen , the sample mean varies when a sample is collected again. The
sample mean is random. Note that the population mean is a xed number. It is called a parameter
to estimate.
1.2 Statistical inference
As the sample mean is random and we do not know the population mean, we would like to asses
how precise the sample mean (an estimator) is to estimate the population mean. In order to make
such an assessment, we assume about the population, to some extent, and obtain an
approximate distribution of the sample mean (an estimator).
One possibility is to assume that the population spending yesterday by a rst year UK university
student is approximately normally distributed. Later, we will see that, under this assumption,
standardised sample mean will have student t distribution with n-1 degrees of freedom, where n is
the sample size. Such a result will be used for statistical inference.
EXAMPLE Suppose you ipped a coin 100 times. You obtained 25 heads and 75 tails. What
would you conclude from this result? There are three possible hypotheses:
1. The coin is actually fair and just it happened by chance
2. The coin used is not fair
3. The coin is fair but you cheated somehow when you ip the coin so that it tends to get tails
Statistical inference would start assuming that the coin is fair and the tosser doesn't cheat. Then
challenge the rst hypothesis in view of how unlikely to obtain the observed outcome. In this
1
fi fi fi fl ffi fl fi fi fffi
, example, under the assumption of a fair coin without cheating, the probability of obtaining 25
heads or less out of 100 tosses is 0.00000028. What would you conclude?
Preliminary 1: Mathematical symbols
Mathematical discussion introduces a lot of familiar and unfamiliar symbols in order to make the
discussion concise and unambiguous.
ni=1 f(xi) = f(x1) + f(x2) + … + f(xn)
Πni=1 f(xi) = f(x1) x f(x2) x … x f(xn)
Preliminary 2: Mathematical discussions
Usually the mathematical “results” are stated as theorems or propositions, which are proven
under some assumptions. Theorists will provide a mathematical proof for a result under some
assumptions. Learning proofs of key theorems are very useful to understand the results.
Mathematical discussions begin with de ning key variables and functions, stating assumptions,
then present propositions/theorems, to be proven later.
Preliminary 3: Sampling
As discussed above, it is important that the sample is a fair representative of the population.
EXAMPLE Suppose that a marketing manager wants to asses customers’ reactions to a new
food product before putting it on the shelf in a big supermarket chain. He plans to conduct a
survey of the students in a lecture theatre.
Is there a problem with sampling method? Consider the population of his interest. Such a sample
is very unlikely to re ect the spectrum of views of the population of interest and may well be
biased toward one end of that spectrum. To avoid problems like this, it is important that the
principle of randomness be embodied in the sample selection procedure. The easiest way to
achieve this is called simple random sampling.
Simple random sampling is a procedure in which every possible sample of n objects is equally
likely to be chosen from a population of N objects. The resulting sample is called a random
sample. There are two ways for simple random sampling: with replacement and without
replacement. “With replacement” allows the selector to choose the same observations in a
sample, whilst “without replacement” does not. Draws from continuous Uniform (0, 1) are typically
used for simple random sampling.
Suppose N = 10 and randomly choose n = 4 students:
• Random sampling with replacement
1. Index students from 1 to N
2. Draw N random numbers from Uniform (0, 1) (e.g. excel command “=rand()”)
0.369 0.962 0.459 0.407 0.611 0.046 0.012 0.311 0.119 0.623
3. Multiply by N and round up, choose the rst n indexes
4 10 5 5 7 1 1 4 2 7
• Random sampling without replacement
1. Index objects from 1 to N
2. Draw N numbers from Uniform (0, 1), attach them to the student id
3. Choose n students with the n largest random numbers
2
𝛴 fl fi
, 1 2 3 4 5 6 7 8 9 10
0.369 0.962 0.459 0.407 0.611 0.046 0.012 0.311 0.119 0.623
Section 2: Sampling distribution of the mean, population variance known
Statistical inference can be loosely de ned as a process of drawing conclusions from a sample
about the population from which it is drawn. The population is a set of numbers from which a
sample is drawn. The distribution of the numbers constituting a population is called the population
distribution.
If X1, X2, …, Xn are independent and identically distributed (i.i.d) random variables, we say that
they constitute a random sample from the in nite population given by their common distribution
where n is the sample size.
In practice, we often deal with random samples from populations that are nite (e.g. rst year
students in UK universities). Most populations are large enough to be treated as if they were
in nite. Thus, most statistical theories and most of the methods we will discuss apply to samples
from in nite populations, as in the above de nition. Statistical inferences are usually based on
statistics, that are functions of a set of a random sample, such as sample mean and sample
variance.
If X1, X2, …, Xn constitute a random sample, then the sample mean is given by X̄ = ( ni=1 Xi) / n,
and the sample variance is given by S2 = [ ni=1 (Xi - X̄ )2] / (n-1). It is common practice to apply the
terms “random sample”, “statistic”, “sample mean”, etc. to the values or realisation of the random
variables. We use upper case letters to denote random variables (e.g. X̄ ) and lower case letters to
denote the values or realisations (e.g. x̄ ).
As illustrated in the rst example, the value of statistics (e.g. sample mean) can be expected to
vary from sample to sample. We call the distributions of such statistics the sampling distributions.
Here, we focus on the sampling distribution of the sample mean, given the population variance,
2.
If X1, X2, …, Xn constitute a random sample from an in nite population with the mean, , and the
variance, 2, then E(X̄ ) = and var(X̄ ) = 2/n, where X̄ = ni=1 Xi/n.
Proof of E(X̄ ) = :
E(X̄ ) = E[(X1 + X2 + … + Xn)/n]
E(X̄ ) = 1/n [E(X1 + X2 + … + Xn)]
E(X̄ ) = 1/n [E(X1) + E(X2) + … + E(Xn)]
E(X̄ ) = 1/n (n )
E(X̄ ) =
Proof of var(X̄ ) = 2/n:
var(X̄ ) = var[(X1 + X2 + … + Xn)/n]
var(X̄ ) = 1/n2 [var(X1 + X2 + … + Xn)]
var(X̄ ) = 1/n2 [var(X1) + var(X2) + … + var(Xn)]
var(X̄ ) = 1/n2 (n 2)
var(X̄ ) = 2/n
2.1 Large sample results
3
𝜎 fi 𝜇fi 𝜎 𝜎 𝜇 𝜎𝜇 𝜎 fi 𝜇 fi𝜎 𝛴 fifi fi𝛴 fi 𝛴fi 𝜇
, E(X̄ ) = and var(X̄ ) = 2/n imply that when n becomes larger, we expect a higher chance of
obtaining values of x̄ closer to . This idea is formalised by the result called the Law of Large
Numbers (LLN). Before stating it, we would like to note a theorem which is used to prove LLN.
THEOREM Chebyshev’s inequality states that if Y is a random variable with nite E(Y2), for any
positive constant c: Pr(|Y| ≥ E(Y2)/c2.
THEOREM The Weak Law of Large Numbers states that if X1, X2, …, Xn constitute a random
sample from an in nite population with mean, , and variance, 2, then, for any c > 0: limn➞∞ Pr(|X̄
- |) > c) = 0, where X̄ = ni=1 Xi/n.
This reads “for any positive constant c, the probability that X̄ will take on a value between -c
and + c will approach 1, as n tends to in nity”. Note that it states that the probability of being |X̄
- | > c tends to 0, not directly that X̄ tends to , as n➞∞.
By Chebyshev’s inequality: Pr(|X̄ - | > c) ≤ 2/nc2, therefore, by taking the limit, limn➞∞ Pr(|X̄ - | >
c) = 1. We say “the probability of X̄ is ”: plimn➞∞ X̄ = , or “X̄ converges to in probability as
n➞∞”: X̄ ➞p as n➞∞.
The Central Limit Theorem states that if X1, X2, …, Xn constitute a random sample from an in nite
population with mean, , and variance, 2, then the limiting distribution or the asymptotic
distribution of (X̄ - ) / ( /√n) as n➞∞ is the standard normal distribution.
Note that it does not state that the distribution of X̄ tends to a normal distribution when n➞∞.
Rather, the distribution of the standardised sample mean tends to a standard normal distribution
as n➞∞. This result justi es approximating the distribution of X̄ with a normal distribution for
su ciently large n. What n is su ciently large? There is no clear cut general answer. However, to
avoid ambiguity, in this lecture we set n ≥ 30 as su ciently large n.
2.2 Small sample results
When n is not su ciently large (n < 30), how can we tackle the problem of approximating the
distribution of X̄ ? Using the above set up “X1, X2, …, Xn constitute a random sample from an
in nite population with mean, , and variance, 2”, approximation by a normal distribution may
not provide a good approximation to the distribution of X̄ in general. We will compensate the lack
of su cient information by imposing stronger structure in the population.
Suppose now we assume “X1, X2, …, Xn constitute a random sample from a normal population
with mean, , and variance, 2”. As we know, a linear combination of normal random variables will
be a normal random variable. Since X1, X2, …, Xn are independent of each other and each are
distributed N( , 2), X̄ which is a linear combination of normal random variables X1, X2, …, Xn, will
be distributed as N( , 2/n).
Sampling distribution I states that when X1, X2, …, Xn constitute a random sample from a normal
population with mean, , and variance, 2. Then, for any n, the distribution of Z = (X̄ - ) / ( /√n) is
the standard normal distribution.
When 2 is replaced by its sample counterpart, sample variance S2 = [ ni=1 (Xi - X̄ )2] / (n-1), we
have the following result:
4
𝜇fiffi 𝜇ffi𝜎𝜇
𝜇𝜇𝜇 𝜎ffifi𝜇𝜇𝜎𝜎𝜇𝜎𝛴
fi 𝜎 𝜇𝜇ffi 𝜇 𝜇 𝜎 fi𝜎 𝜇𝜎 ffi 𝜇 𝜎 𝛴 𝜇fi 𝜇 𝜎𝜇 fi𝜇