Summary for the course Data Science Methods, given for the Masters of Econometrics at Tilburg University in 2018. Summary of the book The Elements of Statistical Learning
The summary is described as "Full summary of the book The Elements of Statistical Learning" however not all chapters are summarized. This summary covers chapters 1, 2, 4, 5, 6, 8 and 10. Unfortunately, I needed chapters 3,7 and 9 but these chapters are not covered in this summary.
Door: taylorvink • 3 jaar geleden
Apologies, I checked the description and adapted it to make it more clear that it was the full summary of the course as given in 2018. Thank you for the heads up
Door: luukvanson • 4 jaar geleden
Door: daniquesabel • 5 jaar geleden
Door: renewijnen28 • 5 jaar geleden
Asymptotic properties and many other derivations of different estimators are missing. Also there are some big mistakes in the summary, for example on how bootstrapping works.
Door: taylorvink • 5 jaar geleden
Hello!

About the bootstrap, I got it directly from the book used 1 year ago, no input from me. For things missing, this could be the case - I only summarized what was needed for the course back then and the scope might have changed. So sorry if this was not what you expected, and for other people; the summary is based on the course given in 2018.
Week 1: Chapter 1 and Chapter 10
Supervised learning involves building a statsttal model for predittngg or estmatng an
output based on one or more inputs. It is tlear whith outtome you want to preditt of whith
treatment efett you want to quantfy.
Unsupervised learning there are inputs but no supervising output. Neverthelessg we tan
learn relatonships and struttures from suth data. You want to get the feel of the datag
redute the dimension. There is no good way to assess the graphs betause the goal is not
well-defned. If you defne the goal more tlearlyg you tan beter assess results. Should be
viewed as a pretursory step to supervised learning.
Clustering problem fnd similarites by grouping individuals attording to their observed
tharatteristts. ere we are not trying to preditt an output variable. Moore on this in C 10.
Notaton
n = number of distntt data points or observatons in our sampleg like n = 3000
people.
p = number of variables that are available for use in making predittons (like yearg
wageg sex ett.)
xij = jth variable of ith observatonsg where i = 1g g n and j = 1g g p.
X = n x p matrix whose (igj)th element is xij.
T or ‘ is notaton for transpose.
yi is the ith observaton of the variable on whith we wish to make predittonsg like
wage.
Note that we tan only multply two matrixes A and B to get AB if the number of tolumns in A
are equal to the number of rows in B.
,Chapter 10: unsupervised learning
Unsupervised learning set of statsttal tools intended for the setng in whith we have
only a set of features X1g X2g g Xp measured on n observatons. We are not interested in
predittong betause we do not have an assotiated response variable y. Ratherg the goal is to
distover interestng things about the measurements X1g X2g g Xp. There are two spetift
types of unsupervised learning:
1. Printipal Components Analysis used for data visualizaton or data pre-protessing
before supervised tethniques are applied.
2. Clustering method for distovering unknown subgroups in data.
Unsupervised learning is ofen performed as part of an exploratory data analysis. In this
taseg we tannot thetk our work betause we do not know the true answer.
Printipal Component Analysis
When fated with a large set of torrelated variablesg printipal tomponents allow us to
summarize this set with a smaller number of representatve variables that tollettvely
explain most of the variability of the original set. Printipal Component Analysis (PCA) refers
to the protess by whith printipal tomponents are tomputedg and the subsequent use of
these tomponents in understanding the data.
When p is largeg examining two-dimensional stater plots of the data is too tumbersome. A
beter method to visualize the n observatons is then required. PCA does this it fnds a
low-dimensional representaton of a data set that tontains as muth as possible of the
variaton. IS done if we have several featuresg and we believe there are only a few underlying
traits that are important for destribing and analysing the data. PCA seeks a small number of
dimensions that are as interestng as possibleg where the tontept of interestng is measured
by the amount that the observatons vary along eath dimension. Eath of the dimensions
found by PCA is a linear tombinaton of the p features.
ow are printipal tomponents found? The frst printipal tomponent of a set of features X 1g
X2g g Xp is the normalized linear tombinaton of the features that has the largest variante:
Normalized:
loadings of the frst printipal tomponentg whith together make the printipal
tomponent loading vettor . We tonstrain the loadings so that their sum
squares is equal to oneg sinte otherwise setng these elements to be arbitrarily large in
absolute value tould result in an arbitrarily large variante.
The frst printipal tomponent loading vettor solves the optmizaton problem:
,whithg using tan be writen as .
We refer to z11g g zn1 as the stores of the frst printipal tomponent. The maximizaton
problem mentoned above tould be solved via eigen detompositon.
Geometrit interpretaton frst printipal tomponent:
Loading vettor ϕ 1 with elements defnes a diretton in feature spate along
whith the data vary the most.
Afer the frst printipal tomponent Za1 of features has been determinedg we tan fnd the
setond printipal tomponent Za2g whith is the linear tombinaton of X1g X2g g Xp that has
maximal variante out of all linear tombinatons that are untorrelated with Za 1. The stores will
take the form:
with ϕ 2 being the setond printipal tomponent loading vettor.
It turns out that tonstraining Za2 to be untorrelated to Za1 is equivalent to tonstraining the
diretton ϕ 2 to be orthogonal (= perpenditular) to the diretton ϕ 1. To fnd ϕ 2 we solve a
similar maximizaton problem as beforeg but with the additonal tonstraint that ϕ 2 is
orthogonal to ϕ 1.
Onte we have tomputed the printipal tomponentsg we tan plot them against eath other in
order to produte low-dimensional views of the datag e.g. Za1 to Za2g Za1 to Za3 ett.
Printipal tomponent biplot: displays
both the printipal tomponent stores
and the printipal tomponent loadings.
Blue state names: represent the
stores of the frst two printipal
tomponents.
Orange arrows: inditate the frst two
printipal tomponent loading vettors.
, First loading vettor plates approximately equal weight on Assaultg Mourder and Rape (orange
lines are horizontally tlose) and muth less weight on UrbanPop. Thereforeg this tomponent
roughly torresponds to a measure of overall rates of serious trimes. The setond loading
plates most of its weight on UrbanPopg therefore roughly torresponding to the level of
urbanizaton of the state. Overallg we see that the trime-related variables are lotated tlose
to eath other – inditatng these are torrelated with eath other – and UrbanPop is less
torrelated to the rest. This inditates for example that states with high murder rates tend to
have high assault and rape rates too.
Note: Rape stores 0.54 on frst printipal tomponent (horizontal axes) but 0.17 on the setond
(verttal axes).
There is an alternatve interpretaton for printipal tomponents: printipal tomponents
provide low-dimensional linear surfates that are tlosest to the observatons.
First printipal tomponent loading vettor is the line in p-dimensional spate that is tlosest
to the n observatonsg using average squared Eutlidean distante as a measure of tloseness.
So we seek a single dimension of the data that lies as tlose as possible to all of the data
points will provide a good summary of the data. If we have the frst two printipal
tomponentsg they span the plane (two dimensional) that is tlosest to the n observatons. The
frst three printipal tomponents span a hyperplaneg ett. These are all in terms of Eutlidean
distante.
Using this interpretatong together the frst Mo printipal tomponent store vettors and the
frst Mo printipal tomponent loading vettors provide the best Mo-dimensional approximaton
(in terms of Eutlidean distante) to the ith observaton xijg or i.o.w. they tan give a good
approximaton of the data when Mo is suftiently large.
We already mentoned that before PCA is performedg the variables should be tentred to
have mean zero. Furthermoreg the results obtained when we perform PCA will also depend
on whether the variables have been individually staled. Results are very sensitve to the
staling used. Thereforeg it is undesirable for the printipal tomponents obtained to depend
on an arbitrary thoite of staling. Thereforeg we typitally stale eath variable to have standard
deviaton one before we tan perform PCA. In tertain setngs – for example when variables
are measured in the same units – we might not wish to stale.
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper taylorvink. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €5,49. Je zit daarna nergens aan vast.