100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Summary Data Science Methods EOR $14.72   Add to cart

Summary

Summary Data Science Methods EOR

 52 views  1 purchase
  • Course
  • Institution

Summary of the DSM course, taught in the EOR master at Tilburg University.

Preview 4 out of 85  pages

  • April 2, 2024
  • 85
  • 2023/2024
  • Summary
avatar-seller
Tilburg University

QFAS


Summary DSM

Author: Supervisor:
Rick Smeets Boldea, O

April 2, 2024

,Table of Contents
1 Small and Large Order Probabilities 4

2 Unsupervised learning 4
2.1 Principal Component Analysis (PCA) . . . . . . . . . . . . . . 4
2.1.1 Finding Principal Components (dimensions) . . . . . . 5
2.1.2 Example: US Arrests Data . . . . . . . . . . . . . . . . 6
2.1.3 Numerical Computation PCA . . . . . . . . . . . . . . 8
2.1.4 NIPALS . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.5 Screeplot PCA . . . . . . . . . . . . . . . . . . . . . . 10

3 Clustering 11
3.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . 14
3.2.1 Interpreting a Dendrogram . . . . . . . . . . . . . . . . 14
3.2.2 The Hierarchical Clustering Algorithm . . . . . . . . . 15
3.2.3 Choice of Dissimilarity Measure . . . . . . . . . . . . . 17
3.3 Practical Issues in Clustering . . . . . . . . . . . . . . . . . . 17

4 Supervised (statistical) Learning 17
4.1 Why Estimate f ? . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1.1 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 How To Estimate f ? . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.1 Parametric Methods . . . . . . . . . . . . . . . . . . . 20
4.2.2 Non-Parametric Models . . . . . . . . . . . . . . . . . 21
4.3 Assessing Model Accuracy . . . . . . . . . . . . . . . . . . . . 21
4.3.1 Measuring the Quality of Fit . . . . . . . . . . . . . . . 21
4.3.2 The Bias-Variance Trade-Off . . . . . . . . . . . . . . . 25
4.4 The Classification Setting . . . . . . . . . . . . . . . . . . . . 27
4.4.1 The Bayes Classifier . . . . . . . . . . . . . . . . . . . 28
4.4.2 K-Nearest Neighbours . . . . . . . . . . . . . . . . . . 30

5 Classification 33
5.1 Why Not Linear Regression? . . . . . . . . . . . . . . . . . . . 34
5.2 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.1 The Logistic Model . . . . . . . . . . . . . . . . . . . . 35


1

, 5.2.2 Estimating the Regression Coefficients . . . . . . . . . 36
5.2.3 Multinomial Logistic Regression . . . . . . . . . . . . . 37
5.3 Generative Models for Classification . . . . . . . . . . . . . . . 37
5.3.1 Linear Discriminant Analysis for p = 1 . . . . . . . . . 38
5.3.2 Linear Discriminant Analysis for p > 1 . . . . . . . . . 40
5.3.3 Quadratic Discriminant Analysis . . . . . . . . . . . . 42
5.4 A Comparison of Classification Methods . . . . . . . . . . . . 44

6 Resampling Methods 47
6.1 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.1.1 The Validation Set Approach . . . . . . . . . . . . . . 47
6.1.2 Leave-One-Out Cross-Validation . . . . . . . . . . . . . 48
6.1.3 k-Fold Cross-Validation . . . . . . . . . . . . . . . . . 49
6.1.4 Bias-Variance Trade Off for k-Fold Cross-Validation . . 51
6.1.5 Cross-Validation for Classification . . . . . . . . . . . . 51
6.2 The Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7 Linear Model Selection and Regularization 54
7.1 Subset Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.1.1 Best Subset Selection . . . . . . . . . . . . . . . . . . . 54
7.1.2 Stepwise Selection . . . . . . . . . . . . . . . . . . . . . 55
7.2 Choosing the Optimal Model . . . . . . . . . . . . . . . . . . . 57
7.2.1 Cp , AIC, BIC and Adjusted R2 . . . . . . . . . . . . . 58
7.2.2 Validation and Cross-Validation . . . . . . . . . . . . . 59
7.3 Shrinkage Methods . . . . . . . . . . . . . . . . . . . . . . . . 60
7.3.1 Ridge Regression . . . . . . . . . . . . . . . . . . . . . 60
7.3.2 The Lasso . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.3.3 The Variable Selection Property of the Lasso . . . . . . 64
7.3.4 Comparing the Lasso and Ridge Regression . . . . . . 65
7.3.5 Selecting the Tuning Parameter λ . . . . . . . . . . . . 67
7.4 Dimension Reduction Methods . . . . . . . . . . . . . . . . . . 67
7.4.1 Principal Components Regression . . . . . . . . . . . . 67
7.4.2 Partial Least Squares . . . . . . . . . . . . . . . . . . . 69

8 Considerations in High Dimensions 70




2

, 9 Tree-Based Methods 72
9.1 The Basics of Decision Trees . . . . . . . . . . . . . . . . . . . 72
9.1.1 Regression Trees . . . . . . . . . . . . . . . . . . . . . 72
9.1.2 Prediction via Stratification of the Feature Space . . . 73
9.1.3 Tree Pruning . . . . . . . . . . . . . . . . . . . . . . . 75
9.2 Classification Trees . . . . . . . . . . . . . . . . . . . . . . . . 77
9.2.1 Advantages and Disadvantages of Trees . . . . . . . . . 78
9.3 Bagging, Random Forests, and Boosting . . . . . . . . . . . . 79
9.3.1 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . 79
9.3.2 Out-of-Bag Error Estimation . . . . . . . . . . . . . . 79
9.3.3 Variable Importance Measures . . . . . . . . . . . . . . 81
9.4 Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9.5 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

10 Double Machine Learning for Treatment and Structural Pa-
rameters 82
10.1 Partially Linear Regression - Double Machine Learning . . . . 82




3

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller rickprive611. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $14.72. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

75759 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$14.72  1x  sold
  • (0)
  Add to cart