Week 1
Multivariate distances
Distances between 𝒙 = (𝑥! , 𝑥" ) and 𝒄 = (𝑐! , 𝑐" )
- Euclidean: 𝑑(𝒙, 𝒄) = *(𝑥! − 𝑐! )" + (𝑥" − 𝑐" )"
- Manhattan (or 𝐿! ): 𝑑(𝒙, 𝒄) = |𝑥! − 𝑐! | + |𝑥" − 𝑐" |
- Maximum: 𝑑(𝒙, 𝒄) = max(|𝑥! − 𝑐! |, |𝑥" − 𝑐" |)
Statistical distance in 2 dimensions
- 𝑑(𝒙, 𝟎) = *𝑥!" + 𝑥""
- 𝑑(𝒙, 𝝁) = *(𝑥! − 𝜇! )" + (𝑥" − 𝜇" )"
#! $%! " #" $%" "
- 𝑑(𝒙, 𝝁) = 67 &!
8 +7 &"
8
#' " #' "
- 𝑑(𝒙, 𝝁) = 67&(! 8 + 7&(" 8 (for rotated axes)
! "
- 𝑑 " (𝒙, 𝝁) = (𝒙 − 𝝁)) 𝐴(𝒙 − 𝝁)
- 𝑑(𝒙, 𝝁) = *[𝒙 − 𝝁]) Σ $! [𝒙 − 𝝁]
Rotation matrix
cos 𝛼 sin 𝛼
To rotate the axes counter clockwise, use the following matrix: = D
− sin 𝛼 cos 𝛼
𝑥F cos 𝛼 sin 𝛼 𝑥! − 𝜇!
That gives rotated axes E ! G = = D= D
𝑥F" − sin 𝛼 cos 𝛼 𝑥" − 𝜇"
cos 𝛼 − sin 𝛼
To rotate the axes clockwise, use the following matrix: = D
sin 𝛼 cos 𝛼
Covariance independence
If two random variables X and Y are independent, then 𝐶𝑜𝑣(𝑋, 𝑌) = 0
Expectation random variables
𝐸(𝑥! ) 𝜇!
𝐸(𝑥" ) 𝜇"
𝐸(𝒙) = O S = O ⋮ S = 𝝁 (expectation of a vector is a vector)
⋮
𝐸Q𝑥* R 𝜇*
(Co)variance random vectors
)
𝑉𝑎𝑟(𝒙) = 𝐸 =Q𝒙 − 𝐸(𝒙)RQ𝒙 − 𝐸(𝒙)R D (expectation of a vector is a matrix)
𝜎!! 𝜎!" ⋯ 𝜎!*
𝜎"! 𝜎"" ⋯ 𝜎"*
=W ⋮ ⋮ ⋱ ⋮ [=∑ (𝜎++ = 𝜎+" and 𝜎+, = 𝐶𝑜𝑣Q𝑥+ , 𝑥, R)
𝜎*! 𝜎*" ⋯ 𝜎**
Correlation random vectors
-./(#,2)
𝜌(𝑥, 𝑦) = ⟺ 𝜎!" = 𝜌!" 𝜎! 𝜎"
4567(#)567(2)
1 𝜌!" ⋯ 𝜌!* 𝜎! 0 ⋯ 0
⎡ ⎤
𝜌 1 ⋯ 𝜌"* ⎥ ! 0 𝜎" ⋯ 0
Correlation matrix: 𝑅 = ⎢ !" and denote 𝑉 " = W ⋮ ⋮ ⋱ ⋮[
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎣𝜌!* 𝜌"* ⋯ 1 ⎦ 0 0 ⋯ 𝜎*
! ! ! !
Then ∑ = 𝑉 " 𝑅 𝑉 " and 𝑅 = 𝑉 $" ∑ 𝑉 $"
Linear combinations
1. 𝐸(𝒂) 𝒙) = 𝒂) 𝐸(𝒙) = 𝒂) 𝝁
2. 𝑉𝑎𝑟(𝒂) 𝒙) = 𝒂) ∑ 𝒂
3. 𝐸(𝐴) 𝒙) = 𝐴) 𝐸(𝒙) = 𝐴) 𝝁
)
4. 𝑉𝑎𝑟(𝐴) 𝒙) = 𝐴 ∑ 𝐴
, Sample
The sample is a matrix 𝑋 with dimensions 𝑛 × 𝑝, where 𝑛 is the number of observations and
𝑝 the number of variables.
𝒙) 𝑥!! 𝑥!" ⋯ 𝑥!* ← first observation of a 𝑝 dimensional vector
⎡ !) ⎤ 𝑥"! 𝑥"" ⋯ 𝑥"*
𝑋 = ⎢𝒙" ⎥ = W ⋮ ⋮ ⋱ ⋮ [
⎢⋮⎥
⎣𝒙)8 ⎦ 𝑥8! 𝑥8" ⋯ 𝑥8*
Geometric interpretation of average
𝒙# 𝒚 𝒙# 𝒚
The projection of 𝒙 onto 𝒚 is: 𝒚# 𝒚 𝒚 = ; 𝒚 where 𝐿𝒚 = *𝒚) 𝒚 is the length of 𝒚
𝒚 ;𝒚
The unit vector is 𝒖 = [1 1 … 1]) and has length 𝐿𝒖% = √𝒖) 𝒖 = √𝑛
𝒙# 𝒖 !
The projection of 𝒙 onto the unit vector 𝒖# 𝒖 𝒖 = 𝒖 8 ∑8+=! 𝑥+ = 𝒖𝑥̅
Deviation vector
The vector that represents the difference of 𝒙 from the projection onto the unit vector is the
𝑥! − 𝑥̅
𝑥" − 𝑥̅
deviation vector: 𝒅 = 𝒙 − 𝑥̅ 𝒖 = O S
⋮
𝑥8 − 𝑥̅
Squared length: 𝐿𝒅 = 𝒅 𝒅 = (𝒙 − 𝑥̅ 𝒖)) (𝒙 − 𝑥̅ 𝒖) = ∑8+=!(𝑥+ − 𝑥̅ )" = 𝑛𝑉𝑎𝑟(𝑥) = (𝑛 − 1)𝑆
" )
Multiplying 2 deviation vectors gives 𝒅)+ 𝒅? = ∑8,=!(𝑥,+ − 𝑥̅+ )(𝑥,? − 𝑥̅? )
𝒅#
& 𝒅' 𝒅#
& 𝒅'
The angle between 2 deviation vectors 𝜃 is cos(𝜃) = ; = = 𝜌+,
𝒅 & ;𝒅 ' @ 𝒅# #
& 𝒅& @𝒅' 𝒅'
So, this is the correlation between 𝒙+ and 𝒙,
If 𝜃 = 0° then cos(𝜃) = 1 = 𝜌, and this is a perfect correlation
If 𝜃 = 90°, then cos(𝜃) = 0 = 𝜌, and the vectors are orthogonal
Week 2
Estimation 𝝁 and Σ
! !
• = ∑8+=! 𝒙+ is an unbiased estimator of 𝝁 and 𝑉𝑎𝑟(𝒙
𝒙 •) = Σ
8 8
!
𝑆 = 8$! ∑8+=!(𝒙𝒊 − 𝒙•)(𝒙𝒊 − 𝒙 •)) is an unbiased estimator of Σ
Generalized variance
The determinant of the (co)variance matrix is called the generalized variance. It summarizes
the (co)variance matrix in one number
Generalized variance in two dimensions
The determinant of the (co)variance matrix, det (𝑆), is the area spanned by the vectors of S
"
It can be calculated by det(𝑆) = 𝑠!! 𝑠"" − 𝑠!"
With 𝒅!) 𝒅! = ∑8+=!(𝑥+! − 𝑥̅! )" = (𝑛 − 1)𝑠!! , 𝒅)" 𝒅" = ∑8+=!(𝑥+" − 𝑥̅" )" = (𝑛 − 1)𝑠"" ,
𝒅!) 𝒅" = ∑8+=!(𝑥+! − 𝑥̅! )(𝑥+" − 𝑥̅" ) = (𝑛 − 1)𝑠!" and 𝒅!) 𝒅" = cos 𝛼 𝐿𝒅! 𝐿𝒅" the formula
! "
det(𝑆) = 78$!8 (𝐴𝑟𝑒𝑎 𝑜𝑓 𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑜𝑔𝑟𝑎𝑚 𝑠𝑝𝑎𝑛𝑛𝑒𝑑 𝑏𝑦 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑣𝑒𝑐𝑡𝑜𝑟𝑠)"
Generalized variance in p dimensions
! *
det(𝑆) = 78$!8 (ℎ𝑦𝑝𝑒𝑟𝑣𝑜𝑙𝑢𝑚𝑒)"
Theorems generalized variance
1. The generalized variance is zero ⟺ at least one of the deviation vectors is spanned by
others, i.e., columns of the sample matrix are linearly dependent
2. If 𝑛 ≤ 𝑝, then the generalized variance is zero