A detailed summary of the lessons of Machine Learning taught by David Martens at the University of Antwerp. This is a summary of my own notes, the slides and the book Data Science for Business.
Machine Learning (Data Mining) - Samenvatting (slides en handboek)
Full Summary of Chapters and Lecture Slides Data Science for Business
Data Mining for Data Science and Analytics - New England College - Chapter 3
All for this textbook (25)
Written for
Universiteit Antwerpen (UA)
Handelsingenieur
Machine Learning
All documents for this subject (1)
1
review
By: woutdecanck • 1 week ago
Translated by Google
Very bad summary. Very unclear
By: thijshanssen • 1 day ago
Translated by Google
Sorry to hear this. This is also the first complaint I've received. Is there anything I can do to help clarify something?
Seller
Follow
thijshanssen
Reviews received
Content preview
MACHINE LEARNING
SUMMARY
1. Introduction 2
2. Predictive modeling 4
2.1. Explaining versus predicting 4
2.2. Data preprocessing 5
2.3. Terminology 8
2.4. Finding informative variables from the data 10
2.5. Decision trees 11
2.6. Mathematical models 14
2.6.4. Logistic regression 14
2.6.2. Support vector machines (SVM) 15
2.7. Overfitting and its avoidance 18
3. Assessing model performance 22
3.1. Evaluating classifiers 22
3.2. Expected value 23
3.3. Evaluation and baseline performance 24
4. Visualizing model performance 26
4.1. Profit curves 26
4.2. ROC curve 27
4.3. Cumulative response and lift curves 28
5. Naive Bayes 32
5.1. Bayes 32
5.2. A model of evidence lift 34
6. Descriptive modeling 36
6.1. Nearest-neighbor 36
6.2. Clustering 38
6.3. Frequent itemsets and association rules 39
6.4. Recommender systems 42
6.5. Conclusion and exercises 44
7. Ensemble methods and artificial neural networks 46
7.1. Ensemble methods 46
7.2. Artificial neural networks 48
7.3. Deep learning 51
8. Text mining 52
8.1. Why text mining? 52
8.2. Text processing 52
8.3. Document Classification and clustering 55
8.4. Topic modeling and word embeddings 56
, 8.5. Case study in politics 57
9. Data science ethics 60
9.1. Data gathering: privacy, A/B testing and bias 61
9.2. Data preprocessing: proxies, government backdoors 61
9.3. Modeling: ZK proofs, discrimination 62
9.4. Model evaluation: explain 62
1
,1. Introduction
Data science = set of fundamental principles that guide extraction of knowledge from data
Data mining = the extractionproces of knowledge from data
AI = methods for improving knowledge of an agent over time due to experience
Generative AI: generates texts, making predictions based on prompt and previous word
ML = auto extraction of patterns from large amounts of data
Ex; Wal-Mart learned what products get sold more before hurricanes
Ex; recommendation system → “frequently bought together”
Ex: market basket analysis → give coupon for milk if bread and butter bought together
→ target variable labels needed for algo to make distinctions
⇒ based on data (data mining): classification model ⇒ used for predictions
End-user is engine of discovery
- You know what you look for
- Querying = request for a subset of data or for statistics ex; average, graphs, …
- Tools: SQL (Structured Query Language) + GUI (Graphical User Interface)
- OLAP (One-Line Analytical Processing) = advanced query and reporting
Business intelligence = getting the right info to the right person at the right time
2
, End-user isn’t engine of discovery
- You don’t know what you look for ⇒ new knowledge
- Computer finding patterns → ML
AI
● A computer interacts through data
● Learning from data ⇒ intelligence
● Big data + ML = AI
● Mainly used for predictions ex; fb likes ⇒ political preference
CRISP-DM (Cross Industry Standard Process for Machine Learning)
DDD: has proven value ⇒ automated decisions
Data science roles
● Computer science: python, database creation, …
● Domain knowledge
● Communication skills
Data + ability extract knowledge = key strategic assets
Ex; Value facebook stems from data
Ex; Income Robinhood: selling training data to hedge funds
Big data = datasets that are too large for traditional data processing systems
Data warehouse: collect and combine data from across an enterprise
Fundamental concepts of data science
● CRISP-DM
● Find informative descriptive attributes of entities of interest based on large mass of data
using information tech
○ Finding variables that correlate with target
○ Recursively: predict target based on attributes
● Overfitting: finding patterns that don’t generalize
● Formulating solutions and evaluating relies on context of usage
3
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller thijshanssen. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $11.37. You're not tied to anything after your purchase.