Boek:
ISLRv2_website.pdf (su.domains)
Chapter 1: Introduction
Supervised learning = building a statistical model for predicting an output based on one or more
inputs
Regression = predicting a continuous or quantitative output (price,..)
Classification = predicting a qualitative output (gender, up/down,..)
Unsupervised learning = the inputs are not supervising the outputs
- No outcome variable, just a set of predictors/features measured on a set of samples.
- Objective is more fuzzy: find groups of samples that behave similarly, find features that
behave similarly, find linear combinations of features with the most variation, . . .
- It’s difficult to know how well you are doing.
- Different from supervised learning, but can be useful as a pre-processing step for supervised
learning.
- we lack a response vari- able that can supervise our analysis
Clustering = - grouping individuals according to observed characteristics : here we are not trying to
predict an output variable
Association = determining rules that describe large portions of a dataset
ISL (= introduction to Statistical learning) based on 4 premesis
- Many statistical learning methods are relevant and useful in a wide range of academic and
non-academic disciplines, beyond just the statistical sciences
- Statistical learning should not be viewed as a series of black boxes : no single approach will
perform well in all possible applications
- While it is important to know what job is performed by each cog, it is not necessary to have
the skills to construct the machine inside the box
- We presume that the reader is interested in applying statistical learning methods to real-
world problems
Chapter 2: Statistical learning
= set of tools for making sense of complex datasets
X = input/predictor/independent variable
Y = output/response/dependent variable
f represents the systematic information that X provides about Y à statistical learning refers to a set
of approaches for estimating f
- e captures measurement errors = random error term, which is independent of X and has
mean zero
Why estimate f?
- prediction
- inference
1. prediction
𝑌"= 𝑓$(𝑋) à error term averages to zero
- 𝑓$= estimate for f
- 𝑌" = resulting prediction for Y à often treated as a black box = one is not typically concerned
with the exact term of 𝑓$ , provided that it yields accurate predictions for Y.
,Ideal predictor of Y: mean-squared prediction error: is the function that
minimizes over all functions g(.) at all points X = x
The accuracy of 𝑌" as a prediction for Y depends on 2 quantities
- reducible error = we can potentially improve the accuracy of 𝑓$ by using the most appropriate
statistical learning technique to estimate f
- irreducible error = no matter how well we estimate f, we cannot reduce the error introduced
by ε (bc Y is also a function of ε wich cannot be predicted using X.
o The quantity ε may contain unmeasured variables that are useful in predicting Y: and if
they are not measured or unmeasurable, they can’t be used in the prediction
o Expected value:
o Goal: minimize the reducible error
! irreducible error will always provide an
upper bound on the accuracy of our
prediction for Y
Proof: decompose expected squared error
Expected value is 0
(2nd)
2. Inference
Understand the relationship between X and Y: In this situation we wish to estimate f, but our goal is
not necessarily to make predictions for Y à 𝑓$ cannot be treated as a black box: we need to know the
exact form
- which predectors are associated with the response variable
o identifying the important predictors
- what is the relationship between the predictor and the response
o positive or negative relationship
- what type of model best explains the relationship?
How do we estimate f?
Models of estimating f
- parametric
- non-parametric
training data = n different data points/ observations that we want to fit in our model
ð goal = apply a statistical learning method to the training data in order to estimate the
unknow function f
,parametric
reduces the problem of estimating f down to one of estimating a set of parameters because it
assumes a form for f => it simplifies the problem
1. Make an assumption about the function form of f (bv linear: p+1 parameters)
2. After selecting a model, use training data to fit or train the model (bv least squares)
parametric and structured models: the lineal model is important:
- specified in terms of p+1 parameters: {β0, β1, β2, ... , βp }
- estimate parameters by fitting the model to training data
- almost never correct but serves good and interpretable approximation to unknown true
function à good to see interference
disadvantages: the model we choose will usually not match the true unknown form of f
è choose more flexible modes: estimate a greater number of parameters
è potential to inaccurately estimate f if the form of f assumed is wrong
è more complex model à overfitting: they follow the errors, or noise, to closely
advantages: more interpretable (easier to explain the results)
non-parametric
does not make an explicit assumption on the functional form of f à attempt to get as close to the
data points as possible, without being too rough or too smooth
advantage: has the potential of fitting in a wider range of possible shapes of f
disadvantage: does not reduce the problem, so a larger number of observations is needed for an
accurate estimate of f
non-parametric model:
thin-plate spline: technique that does not impose any pre-specified model on f. It instead attempts
to produce an estimate for f that is as close as possible to the observed data
- importance of level of smoothness
Trade-offs
Restrictive > flexible
- for interference: more interpretable
Flexible > restrictive
- predictions: interpretability not of interest
- wider range of possible shapes
Prediction accuracy vs interpretability
- lin models are easy to interpret
- thin-plate spines not
Good fit vs over-fit or under-fit
Parsimony vs black-box
- prefer simpler model involving fewer
variables over a black-box predictor
involving them all if they have the same
result
The more performant à the less interpretive it becomes
Supervised vs unsupervised learning
We can seek to understand the relationships between the variables between the observations
- using cluster analysis or clustering: look whether observations fall into distinct groups
- sometimes difficulty as variables can’t be put easily in groups because they overlap
, Regression vs classification problems
regression problems: with quantitative data
- use of least squares
- use of K-nearest-neighbors
classification problems: with qualitative data
- use of logistic regression: binary
- use of K-nearest-neighbors
Assessing model accuracy
No best method for every data set à selecting the best approach is therefore very important
Measuring the quality of Fit
Mean squared error = how well its predicted value for a given observation is close to the true response
value for that observation à does it match the observed data?
MSE= small if the predicted responses are very close to the true responses
MSE= large if for some observations, the predicted and true responses differ substantially
- we are interested in the accuracy of the predictions that we obtain when we apply our
method to previously unseen test data à not in the training data
In other words, if we had a large number of test observations we could compute the average squared
prediction error for these observations (x0,y0).
- Select the model for which this is as small as possible
- Fundamental problem: there is no guarantee that the method with the lowest training MSE
will also have the lowest test MSE
o Test MSE often much larger then training MSE
Les avantages d'acheter des résumés chez Stuvia:
Qualité garantie par les avis des clients
Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.
L’achat facile et rapide
Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.
Focus sur l’essentiel
Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.
Foire aux questions
Qu'est-ce que j'obtiens en achetant ce document ?
Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.
Garantie de remboursement : comment ça marche ?
Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.
Auprès de qui est-ce que j'achète ce résumé ?
Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur MarieVerhelst60. Stuvia facilite les paiements au vendeur.
Est-ce que j'aurai un abonnement?
Non, vous n'achetez ce résumé que pour €12,16. Vous n'êtes lié à rien après votre achat.