Summary An Introduction to Statistical Learning, ISBN: 9781461471370 Big Data Analysis (7204MM17XY)
42 views 2 purchases
Course
Big Data Analysis (7204MM17XY)
Institution
Universiteit Van Amsterdam (UvA)
Book
An Introduction to Statistical Learning
Summary of An Introduction to Statistical Learning - ISBN: 9781461471370 Big Data Analysis (7204MM17XY), Chapters 2-10. Excluding the algorithms, including function descriptions and visualisations for clarity.
Samenvatting An Introduction to Statistical Learning, Basis
Summary Statistical Computing (JBM050)
First week notes for Statistical Learning
All for this textbook (8)
Written for
Universiteit van Amsterdam (UvA)
Master Behavioural Data Science
Big Data Analysis (7204MM17XY)
All documents for this subject (3)
Seller
Follow
Jonnez
Content preview
Chapter 2
2 reasons to estimate F:
- Prediction
- Inference
Parametric methods
- Easy to estimate parameters in a linear function
- Model will usually not match the true unknown form of F
Non-parametric methods
- Avoids (wrong) assumption of functional form of F
- Large number of observations is required in order to obtain an accurate estimate of F
, Variance refers to the amount by which ˆf would change if we estimated it using a different training
data set
Bias refers to the error that is introduced by approximating a real-life problem, which may be
extremely complicated, by a much simpler model.
KNN neighbours
When K = 1, the decision boundary is overly flexible and finds patterns in the data that don’t
correspond to the Bayes decision boundary. This corresponds to a classifier that has low bias but very
high variance.
lowest possible test error rate, called the Bayes error rate.
Chapter 3
Curse of dimensionality: As the number of features/dimensions grows, the amount of data we need
to generalize accurately grows exponentially
Chapter 4
LDA to classify more than 2 classes
Why do we need another method, when we have logistic regression?
There are several reasons:
- When the classes are well-separated, the parameter estimates for the logistic regression
model are surprisingly unstable. Linear discriminant analysis does not suffer from this
problem.
- If n is small and the distribution of the predictors X is approximately normal in each of the
classes, the linear discriminant model is again more stable than the logistic regression model.
Check video’s LDA/QDA
sensitivity is the percentage of true defaulters that are identified
specificity is the percentage of non-defaulters that are correctly identified
LDA is a much less flexible classifier than QDA, and so has substantially lower variance.
LDA tends to be a better bet than QDA if there are relatively few training observations and so
reducing variance is crucial.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Jonnez. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $8.12. You're not tied to anything after your purchase.