100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Summary Introduction to Statistical Learning $6.93
Add to cart

Summary

Summary Introduction to Statistical Learning

 331 views  6 purchases
  • Course
  • Institution
  • Book

Summary of Introduction to Statistical Learning. Includes graphs examples from the book. Chapter 2, 3, 4, 5, 7

Preview 2 out of 15  pages

  • No
  • Chapter 2, 3, 4, 5, 7
  • April 5, 2019
  • 15
  • 2018/2019
  • Summary
avatar-seller
ISL

Chapter 2
2.2 Assessing model accuracy
No one method dominates all others over all possible data sets.

2.2.1 Measuring the quality of fit
We need a way to measure how well a model’s predictions actually match the observed
data. The most commonly-used measure in the regression setting is mean squared error
(MSE).
MSE= (1/n) SUM(Yi-f^(Xi))²
The MSE will be small if the predicted responses are very close to the true responses, and
will be large if for some of the observations, the predicted and true responses differ
substantially.

But we don’t want to know if our training set predicts Y for our sample. Thus, we don’t care
whether or not the method accurately predicts diabetes risk for patients used to train the
model, since we already know if they have diabetes. We want to apply it to new people for
the future.
Thus we are interested in knowing whether f^(x0) is approximately equal to y0, where (x0,
y0) is a previously unseen test observation not used to train the statistical learning method.
We want to choose the method that gives the lowest test MSE, as opposed to the lowest
training MSE. If we have a large number of test observations, we could compute:
Ave(y0 – f^(x0))²
This is the average squared prediction error for these test observations (x0,y0). We want to
select the model for which the average of this quantity is as small as possible.

When no test observations are available, one might imagine simply selecting a statistical
learning method that minimizes the training MSE.

The degrees of freedom is a quantity that summarizes the flexibility of a curve.

When a given method yields a small training MSE but a large test MSE, we are said to be
overfitting the data. This happens because our statistical learning procedure is working too
hard to find patterns in the training data, and may be picking up some patterns that are just
caused by random chance rather than by true properties of the unknown function f.

The bias-variance trade-off
When we plot test MSE curves, sometimes U-shapes show up. These turn out to be the
result of two competing properties of statistical learning methods. The expected test MSE,
for a given value x0, van be decomposed into the sum of three fundamental quantities; the
variance, the squared bias, and the variances of the error terms.
In order to minimize the expected test error, we need to select a statistical learning method
that achieves low variance and low bias. Hence, we see that the expected test MSE can
never lie below the variance, which is the irreducible error.

Here variance means the amount by which f^ would change if we estimated it using a
different training data set. Ideally f should not vary too much between training sets f^. If a
method has a high variance, then small changes in the training data can result in large
changes in f^. In general, more flexible statistical methods have higher variance.

Bias refers to the error that is introduced by approximating a real-life problem. E.g. it is
unlikely that any real-life problem truly has a real linear relationship, and so performing

, linear regression will undoubtedly result in some bias in the estimate of f. Generally, more
flexible methods result in less bias.

As we increase the flexibility of our methods, the bias tends to initially decrease faster than
the variance increases. Consequently, the expected test MSE declines. However, at some
point increasing flexibility has little impact on the bias, but starts to significantly increase
the variance. Then the test MSE increases.

The relationship between bias, variance, and test set MSE is referred to as the bias-variance
trade-off.

Chapter 3

When answering statistical problems:
1. Find out if there is evidence of an association between the variables (e.g. advertising
expenditure and sales).
2. Check for weak of strong evidence.
3. Try to separate the individual effects of the variable (e.g. TV, radio or newspaper
advertising)
4. Try to find the accuracy of each effect.
5. Try to predict future values (e.g. how many future sales do we predict)
6. Check whether the relationship is linear
7. Check for an interaction effect (e.g. do 50.000 to both television and radio lead to
more sales than 100.000 on only one)

3.1 Simple linear regression
A straightforward approach for predicting a quantitative response Y on the basis of a single
predictor variable X. We are regressing Y on X.
Y ≈ β0 + β1X (ˆ y = ˆ β0 + ˆ β1x)
ß0 represents the intercept and is unknown. ß1 represents the slope and is unknown.
Together they are known as the coefficients/parameters.

3.1.1 Estimating the Coefficients
The goal is to find ß0 and ß1 so that the linear model fits well (so the line is as close to the
n observations). This can be done by finding all X and Y for all observations. The most
common approach involves minimizing the least squares criterion.

3.1.2 Assessing the Accuracy of the Coefficient Estimates
Y = β0 + β1X + E
The error is the catch-all for what we miss (e.g. other variables that cause variation in Y).
We assume that the error term is independent of X. The above formula defines the
population regression line, which is the best linear approximation to the true relationship
between X and Y.

We mostly use the sample mean ^u to estimate u. On average we expect those to be equal,
this estimate is unbiased. This holds for ß0 and ß1 as well: if we estimate those on a
particular data set, then our estimates won’t be exactly equal, but we could average the
estimates obtained over a huge number of data sets, to get them equal.
To see how much a measure is an underestimate/overestimate, we compute the standard
error (SE).
Var(ˆ μ) = SE(ˆ μ)² – (σ²/n)) = SE(ˆ μ) = SE(ˆ μ)² – (σ²/n))² – (σ²/n)

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller lindawijnhoven. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $6.93. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

48756 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 15 years now

Start selling
$6.93  6x  sold
  • (0)
Add to cart
Added