Machine Learning (Data Mining) - Samenvatting (slides en handboek)
150 keer bekeken 13 keer verkocht
Vak
Machine Learning (Data Mining)
Instelling
Universiteit Antwerpen (UA)
Boek
Data Science for Business
Behaalde score: (17/20); Kwalitatieve, uitgebreide, duidelijke, allesomvattende (alle behandelde hoofdstukken) (128p) samenvatting (in Engels) van het vak Data Mining gebaseerd op het handboek, eigen notities en de slides. Recentelijk geschreven en gebruikt (2022).
,Inhoudstafel:
0. General Introduction...............................................................................................................................................7
1. Introduction: Data-Analytic Thinking.....................................................................................................................14
1.1 The Ubiquity of Data Opportunities................................................................................................................14
1.2 Example: Hurricane Frances............................................................................................................................15
1.3 Example: Predicting Customer Churn..............................................................................................................15
1.4 Data Science, Engineering and Data-Driven Decision Making.........................................................................16
1.5 Data Processing and ‘Big Data’.......................................................................................................................17
1.6 From Big Data 1.0 to Big Data 2.0...................................................................................................................17
1.7 Data and Data Science Capability as a Strategic Asset....................................................................................18
1.8 Data-Analytic Thinking....................................................................................................................................19
1.9 This Book........................................................................................................................................................19
1.10 Data Mining and Data Science, Revisited (fundamental concepts)................................................................20
1.11 Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist.............................20
1.12 Summary........................................................................................................................................................20
2. Business Problems and Data Science Solutions.....................................................................................................21
2.1 From Business Problems to Data Mining Tasks................................................................................................21
2.2 Supervised Versus Unsupervised Methods......................................................................................................23
2.3 Data Mining and Its Results.............................................................................................................................24
2.4 The Data Mining Process.................................................................................................................................25
2.4.1 Business Understanding...............................................................................................................................25
2.4.2 Data Understanding......................................................................................................................................26
2.4.3 Data Preparation..........................................................................................................................................26
2.4.4 Modeling.......................................................................................................................................................26
2.4.5 Evaluation.....................................................................................................................................................27
2.4.6 Deployment..................................................................................................................................................27
2.5 Implications for Managing the Data Science Team..........................................................................................28
2.6 Other Analytics Techniques and Technologies................................................................................................28
2.6.1 Statistics........................................................................................................................................................28
2.6.2 Database Querying.......................................................................................................................................28
2.6.3 Data Warehousing........................................................................................................................................29
2.6.4 Regression Analysis.......................................................................................................................................29
2.6.5 Machine Learning and Data Mining..............................................................................................................29
2.6.6 Answering Business Questions with These Techniques................................................................................30
2.7 Summary..........................................................................................................................................................30
3. Introduction to Predictive Modeling: From Correlation to Supervised Segmentation...........................................31
3.1 Models, Induction, Deduction.........................................................................................................................31
2
, 3.2 Supervised Segmentation................................................................................................................................33
3.2.1 Selecting Informative Attributes...................................................................................................................34
3.2.2 Example: Attribute Selection with Information Gain (lezen)........................................................................37
3.2.3 Supervised Segmentation with Tree-Structured Models..............................................................................38
3.3 Visualizing Segmentations...............................................................................................................................39
3.4 Trees as Sets of Rules.......................................................................................................................................40
3.5 Probability Estimation......................................................................................................................................41
3.6 Example: Addressing the Churn Problem with Tree Induction (lezen).............................................................41
3.7 Summary..........................................................................................................................................................41
4. Fitting a Model to Data..........................................................................................................................................42
4.1 Classfication via Mathematical Functions........................................................................................................43
4.1.1 Linear Discriminant Functions.......................................................................................................................45
4.1.2 Optimizing an Objective Function.................................................................................................................47
4.1.3 An Example of Mining a Linear Discriminant from Data (lezen)...................................................................47
4.1.4 Linear Discriminant Functions for Scoring and Ranking Instances................................................................48
4.1.5 Support Vector Machines, Briefly.................................................................................................................48
4.2 Regression via Mathematical Functions..........................................................................................................49
4.3 Class Probability Estimation and Logistic “Regression”....................................................................................49
4.3.1 Logistic Regression: Some Technical Details (lezen).....................................................................................50
4.4 Example: Logistic Regression versus Tree Induction (lezen)............................................................................50
4.5 Nonlinear Functions, Support Vector Machines, and Neural Networks..........................................................51
4.6 Summary..........................................................................................................................................................52
5. Overfitting and Its Avoidance................................................................................................................................53
5.1 Generalization.................................................................................................................................................53
5.2 Overfitting........................................................................................................................................................53
5.3 Overfitting Examined.......................................................................................................................................54
5.3.1 Holdout Data and Fitting Graphs..................................................................................................................54
5.3.2 Overfitting in Tree Induction.........................................................................................................................56
5.3.3 Overfitting in Mathematical Functions.........................................................................................................57
5.4 Example: Overfitting Linear Functions (lezen).................................................................................................57
5.5 Example: Why Is Overfitting Bad? (lezen)........................................................................................................58
5.6 From Holdout Evaluation to Cross-Validation..................................................................................................59
5.7 Example: The Churn Dataset Revisited (lezen)................................................................................................60
5.8 Learning Curves...............................................................................................................................................61
5.9 Overfitting Avoidance and Complexity Control................................................................................................62
5.9.1 Avoiding Overfitting with Tree Induction......................................................................................................62
5.9.2 A General Method for Avoiding Overfitting..................................................................................................62
5.9.3 Avoiding Overfitting for Parameter Optimization (lezen).............................................................................63
3
, 5.10 Summary........................................................................................................................................................63
6. Similarity, Neighbors, and Clusters........................................................................................................................64
6.1 Similarity and Distance....................................................................................................................................64
6.2 Nearest-Neighbor Reasoning...........................................................................................................................65
6.2.1 Example: Whiskey Analytics (lezen)..............................................................................................................65
6.3 Nearest Neighbors for Predictive Modeling.....................................................................................................66
6.3.1 How Many Neighbors and How Much Influence?........................................................................................67
6.3.2 Geometric Interpretation, Overfitting, and Complexity Control...................................................................68
6.3.3 Issues with Nearest-Neighbor Methods.......................................................................................................69
6.4 Some Important Technical Details Relating to Similarities and Neighbors......................................................70
6.4.1 Heterogeneous Attributes............................................................................................................................70
6.4.2 Other Distance Functions (lezen)..................................................................................................................70
6.4.3 Combining Functions: Calculating Scores from Neighbors (lezen)................................................................70
6.5 Clustering.........................................................................................................................................................71
6.5.1 Example: Whiskey Analytics Revisited (lezen)..............................................................................................71
6.5.2 Hierarchical Clustering..................................................................................................................................71
6.5.3 Nearest Neighbors Revisited: Clustering Around Centroids.........................................................................73
6.5.4 Example: Clustering Business News Stories (lezen)......................................................................................75
6.5.5 Understanding the Results of Clustering......................................................................................................75
6.5.6 Using Supervised Learning to Generate Cluster Descriptions (lezen)...........................................................76
6.6 Stepping Back: Solving a Business Problem Versus Data Exploration..............................................................77
6.7 Summary..........................................................................................................................................................77
7. Decision Analytic Thinking I: What Is a Good Model?............................................................................................78
7.1 Evaluating Classifiers.......................................................................................................................................78
7.1.1 Plain Accuracy and Its Problems...................................................................................................................78
7.1.2 The Confusion Matrix...................................................................................................................................79
7.1.3 Problems with Unbalanced Classes..............................................................................................................80
7.1.4 Problems with Unequal Costs and Benefits..................................................................................................81
7.1.5 Generalizing Beyond Classification...............................................................................................................81
7.2 A Key Analytical Framework: Expected Value..................................................................................................81
7.2.1 Using Expected Value to Frame Classifier Use..............................................................................................82
7.2.2 Using Expected Value to Frame Classifier Evaluation...................................................................................82
7.3 Evaluation, Baseline Performance, and Implications for Investments in Data.................................................84
7.4 Summary..........................................................................................................................................................85
8. Visualizing Model Performance.............................................................................................................................86
8.1 Ranking Instead of Classifying..........................................................................................................................86
8.2 Profit Curves....................................................................................................................................................87
8.3 ROC Graphs and Curves...................................................................................................................................88
4
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper studentua2001. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €8,99. Je zit daarna nergens aan vast.