Summary

Summary Data Science Methods EOR

59 views 1 purchase

Course
Data Science Methods (35M2C6M6)

Institution
Tilburg University (UVT)

Summary of the DSM course, taught in the EOR master at Tilburg University.

[Show more]

Preview 4 out of 85 pages

View example

Uploaded on April 2, 2024
Number of pages 85
Written in 2023/2024
Type Summary

unsupervised learning
clustering
supervised learning
classification
resampling methods
linear model selection
regularization
tree
tree based methods
double machine learning

Institution
Tilburg University (UVT)
Education
Econometrics and Operations Research
Course
Data Science Methods (35M2C6M6)

rickprive611

Member since 6 year 4 documents sold

$14.49

Add to cart

Add to wishlist

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Tilburg University

QFAS

Summary DSM

Author: Supervisor:
Rick Smeets Boldea, O

April 2, 2024

,Table of Contents
1 Small and Large Order Probabilities 4

2 Unsupervised learning 4
2.1 Principal Component Analysis (PCA) . . . . . . . . . . . . . . 4
2.1.1 Finding Principal Components (dimensions) . . . . . . 5
2.1.2 Example: US Arrests Data . . . . . . . . . . . . . . . . 6
2.1.3 Numerical Computation PCA . . . . . . . . . . . . . . 8
2.1.4 NIPALS . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.5 Screeplot PCA . . . . . . . . . . . . . . . . . . . . . . 10

3 Clustering 11
3.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . 14
3.2.1 Interpreting a Dendrogram . . . . . . . . . . . . . . . . 14
3.2.2 The Hierarchical Clustering Algorithm . . . . . . . . . 15
3.2.3 Choice of Dissimilarity Measure . . . . . . . . . . . . . 17
3.3 Practical Issues in Clustering . . . . . . . . . . . . . . . . . . 17

4 Supervised (statistical) Learning 17
4.1 Why Estimate f ? . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1.1 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 How To Estimate f ? . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.1 Parametric Methods . . . . . . . . . . . . . . . . . . . 20
4.2.2 Non-Parametric Models . . . . . . . . . . . . . . . . . 21
4.3 Assessing Model Accuracy . . . . . . . . . . . . . . . . . . . . 21
4.3.1 Measuring the Quality of Fit . . . . . . . . . . . . . . . 21
4.3.2 The Bias-Variance Trade-Off . . . . . . . . . . . . . . . 25
4.4 The Classification Setting . . . . . . . . . . . . . . . . . . . . 27
4.4.1 The Bayes Classifier . . . . . . . . . . . . . . . . . . . 28
4.4.2 K-Nearest Neighbours . . . . . . . . . . . . . . . . . . 30

5 Classification 33
5.1 Why Not Linear Regression? . . . . . . . . . . . . . . . . . . . 34
5.2 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.1 The Logistic Model . . . . . . . . . . . . . . . . . . . . 35

1

, 5.2.2 Estimating the Regression Coefficients . . . . . . . . . 36
5.2.3 Multinomial Logistic Regression . . . . . . . . . . . . . 37
5.3 Generative Models for Classification . . . . . . . . . . . . . . . 37
5.3.1 Linear Discriminant Analysis for p = 1 . . . . . . . . . 38
5.3.2 Linear Discriminant Analysis for p > 1 . . . . . . . . . 40
5.3.3 Quadratic Discriminant Analysis . . . . . . . . . . . . 42
5.4 A Comparison of Classification Methods . . . . . . . . . . . . 44

6 Resampling Methods 47
6.1 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.1.1 The Validation Set Approach . . . . . . . . . . . . . . 47
6.1.2 Leave-One-Out Cross-Validation . . . . . . . . . . . . . 48
6.1.3 k-Fold Cross-Validation . . . . . . . . . . . . . . . . . 49
6.1.4 Bias-Variance Trade Off for k-Fold Cross-Validation . . 51
6.1.5 Cross-Validation for Classification . . . . . . . . . . . . 51
6.2 The Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7 Linear Model Selection and Regularization 54
7.1 Subset Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.1.1 Best Subset Selection . . . . . . . . . . . . . . . . . . . 54
7.1.2 Stepwise Selection . . . . . . . . . . . . . . . . . . . . . 55
7.2 Choosing the Optimal Model . . . . . . . . . . . . . . . . . . . 57
7.2.1 Cp , AIC, BIC and Adjusted R2 . . . . . . . . . . . . . 58
7.2.2 Validation and Cross-Validation . . . . . . . . . . . . . 59
7.3 Shrinkage Methods . . . . . . . . . . . . . . . . . . . . . . . . 60
7.3.1 Ridge Regression . . . . . . . . . . . . . . . . . . . . . 60
7.3.2 The Lasso . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.3.3 The Variable Selection Property of the Lasso . . . . . . 64
7.3.4 Comparing the Lasso and Ridge Regression . . . . . . 65
7.3.5 Selecting the Tuning Parameter λ . . . . . . . . . . . . 67
7.4 Dimension Reduction Methods . . . . . . . . . . . . . . . . . . 67
7.4.1 Principal Components Regression . . . . . . . . . . . . 67
7.4.2 Partial Least Squares . . . . . . . . . . . . . . . . . . . 69

8 Considerations in High Dimensions 70

2

, 9 Tree-Based Methods 72
9.1 The Basics of Decision Trees . . . . . . . . . . . . . . . . . . . 72
9.1.1 Regression Trees . . . . . . . . . . . . . . . . . . . . . 72
9.1.2 Prediction via Stratification of the Feature Space . . . 73
9.1.3 Tree Pruning . . . . . . . . . . . . . . . . . . . . . . . 75
9.2 Classification Trees . . . . . . . . . . . . . . . . . . . . . . . . 77
9.2.1 Advantages and Disadvantages of Trees . . . . . . . . . 78
9.3 Bagging, Random Forests, and Boosting . . . . . . . . . . . . 79
9.3.1 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . 79
9.3.2 Out-of-Bag Error Estimation . . . . . . . . . . . . . . 79
9.3.3 Variable Importance Measures . . . . . . . . . . . . . . 81
9.4 Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9.5 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

10 Double Machine Learning for Treatment and Structural Pa-
rameters 82
10.1 Partially Linear Regression - Double Machine Learning . . . . 82

3

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller rickprive611. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $14.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

52510 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling

Popular Universities in the United States

Popular books

Find notes and summaries for these qualifications

Seller