100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Introduction to Statistical Learning €6,48
In winkelwagen

Samenvatting

Summary Introduction to Statistical Learning

 331 keer bekeken  6 keer verkocht

Summary of Introduction to Statistical Learning. Includes graphs examples from the book. Chapter 2, 3, 4, 5, 7

Voorbeeld 2 van de 15  pagina's

  • Nee
  • Chapter 2, 3, 4, 5, 7
  • 5 april 2019
  • 15
  • 2018/2019
  • Samenvatting
book image

Titel boek:

Auteur(s):

  • Uitgave:
  • ISBN:
  • Druk:
Alle documenten voor dit vak (7)
avatar-seller
lindawijnhoven
ISL

Chapter 2
2.2 Assessing model accuracy
No one method dominates all others over all possible data sets.

2.2.1 Measuring the quality of fit
We need a way to measure how well a model’s predictions actually match the observed
data. The most commonly-used measure in the regression setting is mean squared error
(MSE).
MSE= (1/n) SUM(Yi-f^(Xi))²
The MSE will be small if the predicted responses are very close to the true responses, and
will be large if for some of the observations, the predicted and true responses differ
substantially.

But we don’t want to know if our training set predicts Y for our sample. Thus, we don’t care
whether or not the method accurately predicts diabetes risk for patients used to train the
model, since we already know if they have diabetes. We want to apply it to new people for
the future.
Thus we are interested in knowing whether f^(x0) is approximately equal to y0, where (x0,
y0) is a previously unseen test observation not used to train the statistical learning method.
We want to choose the method that gives the lowest test MSE, as opposed to the lowest
training MSE. If we have a large number of test observations, we could compute:
Ave(y0 – f^(x0))²
This is the average squared prediction error for these test observations (x0,y0). We want to
select the model for which the average of this quantity is as small as possible.

When no test observations are available, one might imagine simply selecting a statistical
learning method that minimizes the training MSE.

The degrees of freedom is a quantity that summarizes the flexibility of a curve.

When a given method yields a small training MSE but a large test MSE, we are said to be
overfitting the data. This happens because our statistical learning procedure is working too
hard to find patterns in the training data, and may be picking up some patterns that are just
caused by random chance rather than by true properties of the unknown function f.

The bias-variance trade-off
When we plot test MSE curves, sometimes U-shapes show up. These turn out to be the
result of two competing properties of statistical learning methods. The expected test MSE,
for a given value x0, van be decomposed into the sum of three fundamental quantities; the
variance, the squared bias, and the variances of the error terms.
In order to minimize the expected test error, we need to select a statistical learning method
that achieves low variance and low bias. Hence, we see that the expected test MSE can
never lie below the variance, which is the irreducible error.

Here variance means the amount by which f^ would change if we estimated it using a
different training data set. Ideally f should not vary too much between training sets f^. If a
method has a high variance, then small changes in the training data can result in large
changes in f^. In general, more flexible statistical methods have higher variance.

Bias refers to the error that is introduced by approximating a real-life problem. E.g. it is
unlikely that any real-life problem truly has a real linear relationship, and so performing

, linear regression will undoubtedly result in some bias in the estimate of f. Generally, more
flexible methods result in less bias.

As we increase the flexibility of our methods, the bias tends to initially decrease faster than
the variance increases. Consequently, the expected test MSE declines. However, at some
point increasing flexibility has little impact on the bias, but starts to significantly increase
the variance. Then the test MSE increases.

The relationship between bias, variance, and test set MSE is referred to as the bias-variance
trade-off.

Chapter 3

When answering statistical problems:
1. Find out if there is evidence of an association between the variables (e.g. advertising
expenditure and sales).
2. Check for weak of strong evidence.
3. Try to separate the individual effects of the variable (e.g. TV, radio or newspaper
advertising)
4. Try to find the accuracy of each effect.
5. Try to predict future values (e.g. how many future sales do we predict)
6. Check whether the relationship is linear
7. Check for an interaction effect (e.g. do 50.000 to both television and radio lead to
more sales than 100.000 on only one)

3.1 Simple linear regression
A straightforward approach for predicting a quantitative response Y on the basis of a single
predictor variable X. We are regressing Y on X.
Y ≈ β0 + β1X (ˆ y = ˆ β0 + ˆ β1x)
ß0 represents the intercept and is unknown. ß1 represents the slope and is unknown.
Together they are known as the coefficients/parameters.

3.1.1 Estimating the Coefficients
The goal is to find ß0 and ß1 so that the linear model fits well (so the line is as close to the
n observations). This can be done by finding all X and Y for all observations. The most
common approach involves minimizing the least squares criterion.

3.1.2 Assessing the Accuracy of the Coefficient Estimates
Y = β0 + β1X + E
The error is the catch-all for what we miss (e.g. other variables that cause variation in Y).
We assume that the error term is independent of X. The above formula defines the
population regression line, which is the best linear approximation to the true relationship
between X and Y.

We mostly use the sample mean ^u to estimate u. On average we expect those to be equal,
this estimate is unbiased. This holds for ß0 and ß1 as well: if we estimate those on a
particular data set, then our estimates won’t be exactly equal, but we could average the
estimates obtained over a huge number of data sets, to get them equal.
To see how much a measure is an underestimate/overestimate, we compute the standard
error (SE).
Var(ˆ μ) = SE(ˆ μ)² – (σ²/n)) = SE(ˆ μ) = SE(ˆ μ)² – (σ²/n))² – (σ²/n)

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper lindawijnhoven. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,48. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 47561 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen
€6,48  6x  verkocht
  • (0)
In winkelwagen
Toegevoegd