Boek:
ISLRv2_website.pdf (su.domains)
Chapter 1: Introduction
Supervised learning = building a statistical model for predicting an output based on one or more
inputs
Regression = predicting a continuous or quantitative output (price,..)
Classification = predicting a qualitative output (gender, up/down,..)
Unsupervised learning = the inputs are not supervising the outputs
- No outcome variable, just a set of predictors/features measured on a set of samples.
- Objective is more fuzzy: find groups of samples that behave similarly, find features that
behave similarly, find linear combinations of features with the most variation, . . .
- It’s difficult to know how well you are doing.
- Different from supervised learning, but can be useful as a pre-processing step for supervised
learning.
- we lack a response vari- able that can supervise our analysis
Clustering = - grouping individuals according to observed characteristics : here we are not trying to
predict an output variable
Association = determining rules that describe large portions of a dataset
ISL (= introduction to Statistical learning) based on 4 premesis
- Many statistical learning methods are relevant and useful in a wide range of academic and
non-academic disciplines, beyond just the statistical sciences
- Statistical learning should not be viewed as a series of black boxes : no single approach will
perform well in all possible applications
- While it is important to know what job is performed by each cog, it is not necessary to have
the skills to construct the machine inside the box
- We presume that the reader is interested in applying statistical learning methods to real-
world problems
Chapter 2: Statistical learning
= set of tools for making sense of complex datasets
X = input/predictor/independent variable
Y = output/response/dependent variable
f represents the systematic information that X provides about Y à statistical learning refers to a set
of approaches for estimating f
- e captures measurement errors = random error term, which is independent of X and has
mean zero
Why estimate f?
- prediction
- inference
1. prediction
𝑌"= 𝑓$(𝑋) à error term averages to zero
- 𝑓$= estimate for f
- 𝑌" = resulting prediction for Y à often treated as a black box = one is not typically concerned
with the exact term of 𝑓$ , provided that it yields accurate predictions for Y.
,Ideal predictor of Y: mean-squared prediction error: is the function that
minimizes over all functions g(.) at all points X = x
The accuracy of 𝑌" as a prediction for Y depends on 2 quantities
- reducible error = we can potentially improve the accuracy of 𝑓$ by using the most appropriate
statistical learning technique to estimate f
- irreducible error = no matter how well we estimate f, we cannot reduce the error introduced
by ε (bc Y is also a function of ε wich cannot be predicted using X.
o The quantity ε may contain unmeasured variables that are useful in predicting Y: and if
they are not measured or unmeasurable, they can’t be used in the prediction
o Expected value:
o Goal: minimize the reducible error
! irreducible error will always provide an
upper bound on the accuracy of our
prediction for Y
Proof: decompose expected squared error
Expected value is 0
(2nd)
2. Inference
Understand the relationship between X and Y: In this situation we wish to estimate f, but our goal is
not necessarily to make predictions for Y à 𝑓$ cannot be treated as a black box: we need to know the
exact form
- which predectors are associated with the response variable
o identifying the important predictors
- what is the relationship between the predictor and the response
o positive or negative relationship
- what type of model best explains the relationship?
How do we estimate f?
Models of estimating f
- parametric
- non-parametric
training data = n different data points/ observations that we want to fit in our model
ð goal = apply a statistical learning method to the training data in order to estimate the
unknow function f
,parametric
reduces the problem of estimating f down to one of estimating a set of parameters because it
assumes a form for f => it simplifies the problem
1. Make an assumption about the function form of f (bv linear: p+1 parameters)
2. After selecting a model, use training data to fit or train the model (bv least squares)
parametric and structured models: the lineal model is important:
- specified in terms of p+1 parameters: {β0, β1, β2, ... , βp }
- estimate parameters by fitting the model to training data
- almost never correct but serves good and interpretable approximation to unknown true
function à good to see interference
disadvantages: the model we choose will usually not match the true unknown form of f
è choose more flexible modes: estimate a greater number of parameters
è potential to inaccurately estimate f if the form of f assumed is wrong
è more complex model à overfitting: they follow the errors, or noise, to closely
advantages: more interpretable (easier to explain the results)
non-parametric
does not make an explicit assumption on the functional form of f à attempt to get as close to the
data points as possible, without being too rough or too smooth
advantage: has the potential of fitting in a wider range of possible shapes of f
disadvantage: does not reduce the problem, so a larger number of observations is needed for an
accurate estimate of f
non-parametric model:
thin-plate spline: technique that does not impose any pre-specified model on f. It instead attempts
to produce an estimate for f that is as close as possible to the observed data
- importance of level of smoothness
Trade-offs
Restrictive > flexible
- for interference: more interpretable
Flexible > restrictive
- predictions: interpretability not of interest
- wider range of possible shapes
Prediction accuracy vs interpretability
- lin models are easy to interpret
- thin-plate spines not
Good fit vs over-fit or under-fit
Parsimony vs black-box
- prefer simpler model involving fewer
variables over a black-box predictor
involving them all if they have the same
result
The more performant à the less interpretive it becomes
Supervised vs unsupervised learning
We can seek to understand the relationships between the variables between the observations
- using cluster analysis or clustering: look whether observations fall into distinct groups
- sometimes difficulty as variables can’t be put easily in groups because they overlap
, Regression vs classification problems
regression problems: with quantitative data
- use of least squares
- use of K-nearest-neighbors
classification problems: with qualitative data
- use of logistic regression: binary
- use of K-nearest-neighbors
Assessing model accuracy
No best method for every data set à selecting the best approach is therefore very important
Measuring the quality of Fit
Mean squared error = how well its predicted value for a given observation is close to the true response
value for that observation à does it match the observed data?
MSE= small if the predicted responses are very close to the true responses
MSE= large if for some observations, the predicted and true responses differ substantially
- we are interested in the accuracy of the predictions that we obtain when we apply our
method to previously unseen test data à not in the training data
In other words, if we had a large number of test observations we could compute the average squared
prediction error for these observations (x0,y0).
- Select the model for which this is as small as possible
- Fundamental problem: there is no guarantee that the method with the lowest training MSE
will also have the lowest test MSE
o Test MSE often much larger then training MSE
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller MarieVerhelst60. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $13.06. You're not tied to anything after your purchase.