Summary Data Science Methods
Contents
Week 1..................................................................................................................................................... 5
Notations ............................................................................................................................................. 5
Supervised learning ............................................................................................................................. 6
Why estimate f? Prediction vs Inference ........................................................................................ 7
Parametric vs Non-parametric Methods ......................................................................................... 8
Smooth Spline.................................................................................................................................. 8
Rough Spline .................................................................................................................................... 8
Prediction Accuracy and Interpretability......................................................................................... 8
Numerical Optimization .................................................................................................................. 9
Supervised Learning: Regression vs Classification .............................................................................. 9
Unsupervised learning......................................................................................................................... 9
Assessing Model Accuracy............................................................................................................. 10
Measuring Model Accuracy ........................................................................................................... 10
Training, Validation and Test Data ................................................................................................ 10
Test Data and Model Evaluation ................................................................................................... 11
Example 1: Training and Test MSE for Non-Linear Model ............................................................ 11
Example 2: Training and Test MSE for Linear Model .................................................................... 11
Flexibility and Overfitting .............................................................................................................. 12
Bias-Variance Trade-Off ................................................................................................................ 12
Bias-Variance Trade-Off Example 1 (bias-variance.R) ................................................................... 12
Bias-Variance Trade-Off: Example 2, Flexible method .................................................................. 13
Variance of Inflexible Method ....................................................................................................... 13
Example: Bias of Inflexible Method ............................................................................................... 13
Example: Bias of Flexible Method ................................................................................................. 14
Classification Setting ..................................................................................................................... 14
Week 2 – Supervised Learning .............................................................................................................. 15
Classification ...................................................................................................................................... 15
Linear Regression recap ................................................................................................................ 15
Logistic Regression ........................................................................................................................ 15
Maximum Likelihood ......................................................................................................................... 16
Logistic regression with several variables ..................................................................................... 17
Multiclass Logistic Regression ....................................................................................................... 17
1
, Resampling Methods ......................................................................................................................... 17
Recap: Bias-Variance Trade-Off ..................................................................................................... 17
Training Error vs. Test Error........................................................................................................... 18
Validation-set Approach ................................................................................................................ 18
Example validation: automobile data............................................................................................ 18
Test MSE: Automobile data ........................................................................................................... 19
K-fold Cross-Validation .................................................................................................................. 19
Leave-One-Out Cross-Validation (LOOVC) .................................................................................... 20
Cross-Validation for Classification Problems................................................................................. 20
Some remarks on (standard) Cross-Validation.............................................................................. 21
Validation with Time-series Data ...................................................................................................... 21
Basic Supervised Learning ................................................................................................................. 22
Supervised vs unsupervised learning ............................................................................................ 22
Prediction versus Inference ........................................................................................................... 22
Three classes of methods .............................................................................................................. 22
Method 1: Best Subset Selection .................................................................................................. 22
Remarks: Best subset selection ..................................................................................................... 23
Forward / Backward Stepwise Selection ....................................................................................... 23
Shrinkage Methods........................................................................................................................ 24
Ridge Regression ........................................................................................................................... 25
Ridge Regression: Scaling of predictors......................................................................................... 25
Normalization ................................................................................................................................ 25
The Lasso ....................................................................................................................................... 27
Selecting the Tuning Parameter .................................................................................................... 27
Comparing the Lasso and Ridge Regression .................................................................................. 27
Week 3 Tree-based Methods ................................................................................................................ 28
Big Picture...................................................................................................................................... 28
Intuition ......................................................................................................................................... 28
Terminology................................................................................................................................... 29
The tree-building process .............................................................................................................. 29
An intuitive algorithm.................................................................................................................... 30
Find the right tree size by Pruning ................................................................................................ 30
Choosing the best subtree............................................................................................................. 30
Tree algorithm ............................................................................................................................... 30
Classification Trees ........................................................................................................................ 31
Gini Index and Deviance ................................................................................................................ 31
2
, Tree R Code example:.................................................................................................................... 31
Prediction model ........................................................................................................................... 32
Summary of trees .......................................................................................................................... 33
Bagging .............................................................................................................................................. 33
Estimate the MSE .......................................................................................................................... 33
Intuition: Classification .................................................................................................................. 34
R Code: Bagging ............................................................................................................................. 34
Random Forest .............................................................................................................................. 35
R Code: Random Forest ................................................................................................................. 35
Confusion matrix & Random Forest: OOB Test Error .................................................................... 36
Tuning Random Forest .................................................................................................................. 36
R Code: Tuning Random Forest ..................................................................................................... 36
Week 4 – PCA ........................................................................................................................................ 38
Introduction ................................................................................................................................... 38
Principal Component Analysis(PCA) .............................................................................................. 38
Basic Idea ....................................................................................................................................... 38
PCA Details .................................................................................................................................... 39
PCA Notations................................................................................................................................ 40
Further PCs .................................................................................................................................... 40
Proportion Variance Explained ...................................................................................................... 40
Week – Deep Learning .................................................................................................................. 41
Introduction to deep learning ........................................................................................................... 41
What is deep learning? .................................................................................................................. 41
Why deep learning? ...................................................................................................................... 41
Applications of Deep Learning....................................................................................................... 41
Feedforward neural networks ........................................................................................................... 42
Key Building Block: The Perceptron .............................................................................................. 42
The Activation Function................................................................................................................. 42
Purple nodes combine two steps: ................................................................................................. 42
Compute Output Y^ using Neurons Zk .......................................................................................... 43
Building a Single Layer Neural Network in R using Keras (one hidden layer) ............................... 43
Dense Layer ................................................................................................................................... 44
A simple Example of a Feedforward NN........................................................................................ 44
R-Code: A simple Example of a Feedforward NN .......................................................................... 45
A simple Example: Training the NN ............................................................................................... 45
Example: Hitters Data .................................................................................................................... 45
3
, Feedforward NN for Classification Problems ................................................................................ 46
Training Neural Networks ................................................................................................................. 47
Training neural networks: Loss minimization................................................................................ 47
Numerical Optimization: Gradient Decent .................................................................................... 47
Computing Gradients: Backpropagtion ......................................................................................... 47
Loss Function of a Deep Neural Network ...................................................................................... 48
Minimization of the Loss Function ................................................................................................ 48
Choosing the Learning Rate........................................................................................................... 49
Training Neural Nets in Practice: Mini-Batches............................................................................. 49
Mini-Batches.................................................................................................................................. 50
Epochs ........................................................................................................................................... 50
A Feedforward Neural Network in R ............................................................................................. 50
Processing Text Data ..................................................................................................................... 51
Regularization for Neural Networks .................................................................................................. 51
Overfitting and Regularization ...................................................................................................... 51
1) Weight Regularization ........................................................................................................... 52
2) Dropout ................................................................................................................................. 52
3) Early Stopping ........................................................................................................................ 53
Task of improving the Neural Network, how? .............................................................................. 53
Neural Nets in Practice ...................................................................................................................... 54
Multi-Output Neural Nets ............................................................................................................. 54
Choosing the Last-Layer Activation and Loss Function ................................................................. 54
Initializing the Weights .................................................................................................................. 54
Exploding and Vanishing Gradients ............................................................................................... 55
Possible Solutions .......................................................................................................................... 55
Network Architecture in Practice .................................................................................................. 55
Outlook .......................................................................................................................................... 55
4
Contents
Week 1..................................................................................................................................................... 5
Notations ............................................................................................................................................. 5
Supervised learning ............................................................................................................................. 6
Why estimate f? Prediction vs Inference ........................................................................................ 7
Parametric vs Non-parametric Methods ......................................................................................... 8
Smooth Spline.................................................................................................................................. 8
Rough Spline .................................................................................................................................... 8
Prediction Accuracy and Interpretability......................................................................................... 8
Numerical Optimization .................................................................................................................. 9
Supervised Learning: Regression vs Classification .............................................................................. 9
Unsupervised learning......................................................................................................................... 9
Assessing Model Accuracy............................................................................................................. 10
Measuring Model Accuracy ........................................................................................................... 10
Training, Validation and Test Data ................................................................................................ 10
Test Data and Model Evaluation ................................................................................................... 11
Example 1: Training and Test MSE for Non-Linear Model ............................................................ 11
Example 2: Training and Test MSE for Linear Model .................................................................... 11
Flexibility and Overfitting .............................................................................................................. 12
Bias-Variance Trade-Off ................................................................................................................ 12
Bias-Variance Trade-Off Example 1 (bias-variance.R) ................................................................... 12
Bias-Variance Trade-Off: Example 2, Flexible method .................................................................. 13
Variance of Inflexible Method ....................................................................................................... 13
Example: Bias of Inflexible Method ............................................................................................... 13
Example: Bias of Flexible Method ................................................................................................. 14
Classification Setting ..................................................................................................................... 14
Week 2 – Supervised Learning .............................................................................................................. 15
Classification ...................................................................................................................................... 15
Linear Regression recap ................................................................................................................ 15
Logistic Regression ........................................................................................................................ 15
Maximum Likelihood ......................................................................................................................... 16
Logistic regression with several variables ..................................................................................... 17
Multiclass Logistic Regression ....................................................................................................... 17
1
, Resampling Methods ......................................................................................................................... 17
Recap: Bias-Variance Trade-Off ..................................................................................................... 17
Training Error vs. Test Error........................................................................................................... 18
Validation-set Approach ................................................................................................................ 18
Example validation: automobile data............................................................................................ 18
Test MSE: Automobile data ........................................................................................................... 19
K-fold Cross-Validation .................................................................................................................. 19
Leave-One-Out Cross-Validation (LOOVC) .................................................................................... 20
Cross-Validation for Classification Problems................................................................................. 20
Some remarks on (standard) Cross-Validation.............................................................................. 21
Validation with Time-series Data ...................................................................................................... 21
Basic Supervised Learning ................................................................................................................. 22
Supervised vs unsupervised learning ............................................................................................ 22
Prediction versus Inference ........................................................................................................... 22
Three classes of methods .............................................................................................................. 22
Method 1: Best Subset Selection .................................................................................................. 22
Remarks: Best subset selection ..................................................................................................... 23
Forward / Backward Stepwise Selection ....................................................................................... 23
Shrinkage Methods........................................................................................................................ 24
Ridge Regression ........................................................................................................................... 25
Ridge Regression: Scaling of predictors......................................................................................... 25
Normalization ................................................................................................................................ 25
The Lasso ....................................................................................................................................... 27
Selecting the Tuning Parameter .................................................................................................... 27
Comparing the Lasso and Ridge Regression .................................................................................. 27
Week 3 Tree-based Methods ................................................................................................................ 28
Big Picture...................................................................................................................................... 28
Intuition ......................................................................................................................................... 28
Terminology................................................................................................................................... 29
The tree-building process .............................................................................................................. 29
An intuitive algorithm.................................................................................................................... 30
Find the right tree size by Pruning ................................................................................................ 30
Choosing the best subtree............................................................................................................. 30
Tree algorithm ............................................................................................................................... 30
Classification Trees ........................................................................................................................ 31
Gini Index and Deviance ................................................................................................................ 31
2
, Tree R Code example:.................................................................................................................... 31
Prediction model ........................................................................................................................... 32
Summary of trees .......................................................................................................................... 33
Bagging .............................................................................................................................................. 33
Estimate the MSE .......................................................................................................................... 33
Intuition: Classification .................................................................................................................. 34
R Code: Bagging ............................................................................................................................. 34
Random Forest .............................................................................................................................. 35
R Code: Random Forest ................................................................................................................. 35
Confusion matrix & Random Forest: OOB Test Error .................................................................... 36
Tuning Random Forest .................................................................................................................. 36
R Code: Tuning Random Forest ..................................................................................................... 36
Week 4 – PCA ........................................................................................................................................ 38
Introduction ................................................................................................................................... 38
Principal Component Analysis(PCA) .............................................................................................. 38
Basic Idea ....................................................................................................................................... 38
PCA Details .................................................................................................................................... 39
PCA Notations................................................................................................................................ 40
Further PCs .................................................................................................................................... 40
Proportion Variance Explained ...................................................................................................... 40
Week – Deep Learning .................................................................................................................. 41
Introduction to deep learning ........................................................................................................... 41
What is deep learning? .................................................................................................................. 41
Why deep learning? ...................................................................................................................... 41
Applications of Deep Learning....................................................................................................... 41
Feedforward neural networks ........................................................................................................... 42
Key Building Block: The Perceptron .............................................................................................. 42
The Activation Function................................................................................................................. 42
Purple nodes combine two steps: ................................................................................................. 42
Compute Output Y^ using Neurons Zk .......................................................................................... 43
Building a Single Layer Neural Network in R using Keras (one hidden layer) ............................... 43
Dense Layer ................................................................................................................................... 44
A simple Example of a Feedforward NN........................................................................................ 44
R-Code: A simple Example of a Feedforward NN .......................................................................... 45
A simple Example: Training the NN ............................................................................................... 45
Example: Hitters Data .................................................................................................................... 45
3
, Feedforward NN for Classification Problems ................................................................................ 46
Training Neural Networks ................................................................................................................. 47
Training neural networks: Loss minimization................................................................................ 47
Numerical Optimization: Gradient Decent .................................................................................... 47
Computing Gradients: Backpropagtion ......................................................................................... 47
Loss Function of a Deep Neural Network ...................................................................................... 48
Minimization of the Loss Function ................................................................................................ 48
Choosing the Learning Rate........................................................................................................... 49
Training Neural Nets in Practice: Mini-Batches............................................................................. 49
Mini-Batches.................................................................................................................................. 50
Epochs ........................................................................................................................................... 50
A Feedforward Neural Network in R ............................................................................................. 50
Processing Text Data ..................................................................................................................... 51
Regularization for Neural Networks .................................................................................................. 51
Overfitting and Regularization ...................................................................................................... 51
1) Weight Regularization ........................................................................................................... 52
2) Dropout ................................................................................................................................. 52
3) Early Stopping ........................................................................................................................ 53
Task of improving the Neural Network, how? .............................................................................. 53
Neural Nets in Practice ...................................................................................................................... 54
Multi-Output Neural Nets ............................................................................................................. 54
Choosing the Last-Layer Activation and Loss Function ................................................................. 54
Initializing the Weights .................................................................................................................. 54
Exploding and Vanishing Gradients ............................................................................................... 55
Possible Solutions .......................................................................................................................... 55
Network Architecture in Practice .................................................................................................. 55
Outlook .......................................................................................................................................... 55
4