A detailed summary of the lessons of Machine Learning taught by David Martens at the University of Antwerp. This is a summary of my own notes, the slides and the book Data Science for Business.
1. Introduction 2
2. Predictive modeling 4
2.1. Explaining versus predicting 4
2.2. Data preprocessing 5
2.3. Terminology 8
2.4. Finding informative variables from the data 10
2.5. Decision trees 11
2.6. Mathematical models 14
2.6.4. Logistic regression 14
2.6.2. Support vector machines (SVM) 15
2.7. Overfitting and its avoidance 18
3. Assessing model performance 22
3.1. Evaluating classifiers 22
3.2. Expected value 23
3.3. Evaluation and baseline performance 24
4. Visualizing model performance 26
4.1. Profit curves 26
4.2. ROC curve 27
4.3. Cumulative response and lift curves 28
5. Naive Bayes 32
5.1. Bayes 32
5.2. A model of evidence lift 34
6. Descriptive modeling 36
6.1. Nearest-neighbor 36
6.2. Clustering 38
6.3. Frequent itemsets and association rules 39
6.4. Recommender systems 42
6.5. Conclusion and exercises 44
7. Ensemble methods and artificial neural networks 46
7.1. Ensemble methods 46
7.2. Artificial neural networks 48
7.3. Deep learning 51
8. Text mining 52
8.1. Why text mining? 52
8.2. Text processing 52
8.3. Document Classification and clustering 55
8.4. Topic modeling and word embeddings 56
, 8.5. Case study in politics 57
9. Data science ethics 60
9.1. Data gathering: privacy, A/B testing and bias 61
9.2. Data preprocessing: proxies, government backdoors 61
9.3. Modeling: ZK proofs, discrimination 62
9.4. Model evaluation: explain 62
1
,1. Introduction
Data science = set of fundamental principles that guide extraction of knowledge from data
Data mining = the extractionproces of knowledge from data
AI = methods for improving knowledge of an agent over time due to experience
Generative AI: generates texts, making predictions based on prompt and previous word
ML = auto extraction of patterns from large amounts of data
Ex; Wal-Mart learned what products get sold more before hurricanes
Ex; recommendation system → “frequently bought together”
Ex: market basket analysis → give coupon for milk if bread and butter bought together
→ target variable labels needed for algo to make distinctions
⇒ based on data (data mining): classification model ⇒ used for predictions
End-user is engine of discovery
- You know what you look for
- Querying = request for a subset of data or for statistics ex; average, graphs, …
- Tools: SQL (Structured Query Language) + GUI (Graphical User Interface)
- OLAP (One-Line Analytical Processing) = advanced query and reporting
Business intelligence = getting the right info to the right person at the right time
2
, End-user isn’t engine of discovery
- You don’t know what you look for ⇒ new knowledge
- Computer finding patterns → ML
AI
● A computer interacts through data
● Learning from data ⇒ intelligence
● Big data + ML = AI
● Mainly used for predictions ex; fb likes ⇒ political preference
CRISP-DM (Cross Industry Standard Process for Machine Learning)
DDD: has proven value ⇒ automated decisions
Data science roles
● Computer science: python, database creation, …
● Domain knowledge
● Communication skills
Data + ability extract knowledge = key strategic assets
Ex; Value facebook stems from data
Ex; Income Robinhood: selling training data to hedge funds
Big data = datasets that are too large for traditional data processing systems
Data warehouse: collect and combine data from across an enterprise
Fundamental concepts of data science
● CRISP-DM
● Find informative descriptive attributes of entities of interest based on large mass of data
using information tech
○ Finding variables that correlate with target
○ Recursively: predict target based on attributes
● Overfitting: finding patterns that don’t generalize
● Formulating solutions and evaluating relies on context of usage
3
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
√ Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper thijshanssen. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €10,49. Je zit daarna nergens aan vast.