STATISTICAL LEARNING
Contents
Chapter 1 – Inference Using the Multivariate Normal Distribution...................................................................................... 2
LECTURE 1.................................................................................................................................................................... 2
LECTURE 2.................................................................................................................................................................... 9
LECTURE 3...................................................................................................................................................................13
Chapter 2 – Principal Components Analysis.......................................................................................................................16
LECTURE 4...................................................................................................................................................................16
LECTURE 5...................................................................................................................................................................23
LECTURE 6 - PRACTICAL 1 ............................................................................................................................................31
Chapter 3 – Factor Analysis ..............................................................................................................................................32
LECTURE 7...................................................................................................................................................................32
LECTURE 8...................................................................................................................................................................39
LECTURE 9 - PRACTICAL 2 ............................................................................................................................................46
Chapter 4 – Canonical Correlation Analysis .......................................................................................................................47
LECTURE 10.................................................................................................................................................................47
LECTURE 11.................................................................................................................................................................52
LECTURE 12 - PRACTICAL 3...........................................................................................................................................53
Chapter 5 – Discriminant Analysis .....................................................................................................................................54
LECTURE 13.................................................................................................................................................................54
LECTURE 14.................................................................................................................................................................65
LECTURE 15 – PRACTICAL 4 ..........................................................................................................................................66
Chapter 6 – Cluster Analysis .............................................................................................................................................67
LECTURE 16.................................................................................................................................................................67
LECTURE 17.................................................................................................................................................................71
LECTURE 18 – PRACTICAL 1 ..........................................................................................................................................72
LECTURE 19.................................................................................................................................................................75
LECTURE 20.................................................................................................................................................................78
LECTURE 21 – PRACTICAL 2 ..........................................................................................................................................80
Chapter 7 – Graphical Models ..........................................................................................................................................82
LECTURE 22.................................................................................................................................................................82
LECTURE 23.................................................................................................................................................................86
LECTURE 24 – PRACTICAL 3 ..........................................................................................................................................89
LECTURE 25.................................................................................................................................................................91
LECTURE 26.................................................................................................................................................................96
LECTURE 27 – PRACTICAL 4 ..........................................................................................................................................98
Chapter 8 – Mixture Models ........................................................................................................................................... 101
LECTURE 28............................................................................................................................................................... 101
LECTURE 29............................................................................................................................................................... 112
LECTURE 30 – PRACTICAL 5 ........................................................................................................................................ 113
, STATISTICAL LEARNING
Chapter 1 – Inference Using the Multivariate Normal
Distribution
LECTURE 1
Multivariate Normal Distribution
- Multivariate analyses are often carried out with the aim to explore the data,
rather than to test hypotheses.
- Nevertheless, there are cases when the complexities of the data require us to
make use of parametric distributions to proceed with our analysis.
- In these cases, the multivariate normal distribution plays an important role in
multivariate statistics.
Definition
A vector of
means. So - We say that 𝑋 = (𝑋1 , … , 𝑋𝑝 )𝑇 follows a p-variate normal distribution with
each variable
- Mean 𝜇 ∈ ℝ𝑝 and
of 𝑋 has its
own mean - Covariance matrix ∑ , a (p x p)-dimensional symmetric, positive definite matrix.
-
- If its p.d.f 𝑓(𝑋) is such that for any 𝑥 = (𝑥1 , … , 𝑥𝑝 )𝑇 , In the diagonals of the
covariance matrix will be
1 1 the variance of each
𝑓𝑋 (𝑥) = 𝑒𝑥𝑝{− (𝑥 − 𝜇)𝑇 ∑−1 (𝑥 − 𝜇)} variable and on each other
(2𝜋)𝑝/2 |∑|1/2 2
entry will be the
covariance of each pair
- We write 𝑋 ~ 𝑀𝑁(𝜇, ∑)
with each other.
The
determinant of
variance-
covariance
matrix.
Covariance and correlation matrix
- In general, 𝜎𝑖𝑗 is the (i,j)-th element of ∑ with 𝑉𝑎𝑟(𝑋𝑖 ) = 𝜎𝑖𝑖 = 𝜎𝑖 2 .
- The correlation between the i-th and j-th variable is
𝜎𝑖𝑗
𝜌𝑖𝑗 =
√𝜎𝑖 2 √𝜎𝑗 2
- Which gives 𝜎𝑖𝑗 = 𝜌𝑖𝑗 𝜎𝑖 𝜎𝑗
- The correlation matrix is:
1 𝜌12 ⋯ 𝜌1𝑝
𝜌21 1 ⋯ 𝜌2𝑝
𝑃=
⋮ ⋮ ⋱ ⋮
(𝜌𝑝1 𝜌𝑝2 ⋯ 1 )
, STATISTICAL LEARNING
Example: Bivariate Normal Random Vector (p=2)
- In this case, we have
𝜎2 𝜌𝜎1 𝜎2
𝑋 = (𝑋1 , 𝑋2 )𝑇 , 𝜇 = (𝜇1 , 𝜇2 )𝑇 and ∑ = ( 1 )
𝜌𝜎1 𝜎2 𝜎2 2
- Where ρ is the correlation coefficient. Clearly,
1 𝜎1 2 𝜌𝜎1 𝜎2
|∑| = 𝜎1 2 𝜎2 2 (1 − 𝜌2 ) ; ∑−1 = 2 2 ( )
𝜎1 𝜎2 (1 − 𝜌2 ) 𝜌𝜎1 𝜎2 𝜎2 2
Example: Bivariate Normal Random Vector
- The density is:
1 1 𝑄(𝑥)
𝑓𝑋 (𝑥) = exp {− }
(2𝜋)𝜎1 𝜎2 √1 − 𝜌2 2 (1 − 𝜌2 )
- Where,
1 𝑇 𝜎2 2 −𝜌𝜎1 𝜎2
𝑄(𝑥) = ( 𝑥 − 𝜇) ( ) (𝑥 − 𝜇)
2
𝜎1 𝜎2 2 −𝜌𝜎1 𝜎2 𝜎1 2
𝑥1 − 𝜇1 2 𝑥1 − 𝜇1 𝑥2 − 𝜇2 𝑥2 − 𝜇2 2
=( ) − 2𝑝 ( )( )+( )
𝜎1 𝜎1 𝜎2 𝜎2
Example: Bivariate Normal Random Vector with 𝝁𝟏 = 𝝁𝟐 and 𝝈𝟏 = 𝝈𝟐 = 𝟏
2d contour
plot of the
3d plot.
Each of the
marginals is
a standard
normal.
Where this
axis
represents
the value of
one of the
random
variables.
And so does
the other. Easy to plot for bivariate normal. As soon as we go to more than two dimension it becomes
more difficult to visualise. Clearly in one dimension we just plotted the bell-shaped curve
and now it 2-dimensions, it looks like a bump. In all cases we have mean to zero and
variance to 1 and all that changes between the three is the correlation.
How each marginal is correlated with each other depends on 𝜌.
The y axis just indicates the pdf evaluated at the two values of these two variables.
Positive correlation means when one variable is positive it is likely that the second variable
is also positive. Higher density areas on the contour is where both are positive, or both are
negative.
And vice versa for a negative correlation
, STATISTICAL LEARNING
Properties of 𝑿 ~ 𝑴𝑵(𝝁, ∑)
1. 𝑋 has the same distribution as 𝜇 + Σ1/2 (𝑌1 , … , 𝑌𝑝 )𝑇 , where 𝑌1 , … , 𝑌𝑝 are
independent N(0,1).
2. 𝐸[𝑋] = 𝜇 and 𝐶𝑜𝑣(𝑋) = Σ
1
3. Σ −2 (𝑥 − 𝜇) ~ 𝑀𝑁(0, 𝐼𝑝 ) ; each element is independent N(0,1) .
1
4. Sample mean: 𝑋̅ = 𝑛 ∑𝑛𝑖=1 𝑋𝑖 ; 𝑋̅~ 𝑀𝑁 (𝜇, 𝑛−1 Σ).
5. 𝑋̅ is the MLE for 𝜇 and
∑𝑛 ̅ ̅ 𝑇
𝑖=1(𝑋𝑖 −𝑋 )(𝑋𝑖 −𝑋 )
𝑆= 𝑛
is the MLE for Σ
Importance of Multivariate Normal Distribution
- Multivariate central limit theorem: If 𝑋1 , … , 𝑋𝑛 are i.i.d random vectors, with
𝐸[𝑋𝑖 ] = 𝜇 and 𝐶𝑜𝑣(𝑋𝑖 ) = Σ then,
𝑛
1 1
∑ 𝑋𝑖 → 𝑀𝑁 (𝜇, Σ) 𝑎𝑠 𝑛 → ∞
𝑛 𝑛
𝑖=1
- It is mathematically convenient: the p.d.f is completely determined by 𝐸[𝑋] = 𝜇
1
and 𝐶𝑜𝑣(𝑋) = 𝛴 . The number of parameters is 2 𝑝(𝑝 + 3).
First row would be
covariance between the
first variable of 𝑥1 with
each variable of 𝑥2 .
Repeat for k times for
the k variables of 𝑥1 .
This will give us Σ12
𝑥1 of dimension k. quadrant.
𝑥2 of dimension p-k.
So, if we want the sub vector 𝑥1 of size k. It would be represented as below:
We won’t
prove this
expression
but will be
using at some
point in this
module.