Samenvatting

Summary Machine Learning for business

8 keer verkocht

Vak
Machine learning for Business (2103TEWDAS)

Instelling
Universiteit Antwerpen (UA)

Boek
Data Science for Business

This summary consists of the slide supplemented by my own notes and clarifications.

[Meer zien]

Voorbeeld 6 van de 75 pagina's

Bekijk voorbeeld

Heel boek samengevat? Ja
Geupload op 29 december 2024
Aantal pagina's 75
Geschreven in 2024/2025
Type Samenvatting

dsc
machine
learning
machinelearning
david martens
lien michiels

Titel boek:Data Science for Business

Auteur(s):Foster Provost, Tom Fawcett

Uitgave:2013
ISBN:9781449374297
Druk:Onbekend

College aantekeningen
Strategy Analytics Notes + Book

Instelling
Universiteit Antwerpen (UA)
Studie
Handelsingenieur
Vak
Machine learning for Business (2103TEWDAS)

Volgen

josefienj03

Lid sinds 3 jaar 222 documenten verkocht

€6,96

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Machine learning
BUSINESS ENGINEERING – FINANCIAL
ENGINEERING
2024-2025

Data Science Challenge: 5
Exam: 15
- Multiple choice
➢ No guessing correction
➢ Censure: if 5 answers per question, you need to answer 60% correct to pass the multiple
choice part
➢ So answer all questions
➢ Details on the exam
- Open question

,Inhoudsopgave
LECTURE 1 – WHAT IS DATA SCIENCE AND MACHINE LEARNING? ....................................................................... 4
1 INTRODUCTION ............................................................................................................................................................................. 4
2 TERMINOLOGY.............................................................................................................................................................................. 4
2.1 Artificial Intelligence ........................................................................................................................................................... 5
2.2 Machine Learning ............................................................................................................................................................... 5
3 DATA SCIENTIST ........................................................................................................................................................................... 7
3.1 Importance ........................................................................................................................................................................... 7
3.2 Roles and skills .................................................................................................................................................................... 7
LECTURE 2 – MACHINE LEARNING .......................................................................................................................... 8
1 EXPLAINING VERSUS PREDICTING MODELLING ............................................................................................................................... 8
2 DATA PREPROCESSING ................................................................................................................................................................. 10
2.1 Sampling .............................................................................................................................................................................. 10
2.2 Encoding .............................................................................................................................................................................. 11
2.3 Missing values .................................................................................................................................................................... 12
2.4 Outliers ............................................................................................................................................................................... 12
2.5 Normalizing ........................................................................................................................................................................ 13
2.6 Discretization .................................................................................................................................................................... 14
3 SOME NOTES ABOUT CHATGPT ................................................................................................................................................... 14
LECTURE 3 – INTRODUCTION TO PREDICTIVE MODELING ............................................................................... 16
1. TERMINOLOGY ............................................................................................................................................................................ 16
2. FINDING INFORMATIVE VARIABLES FROM THE DATA ................................................................................................................... 17
3. DECISION TREES.......................................................................................................................................................................... 17
4. METHOLOGY OF DECISION TREES IN MORE DETAIL ..................................................................................................................... 18
5. OVERFITTING AND ITS AVOIDANCE ........................................................................................................................................... 20
5.1 Overfitting .......................................................................................................................................................................... 20
5.2 Avoidance ........................................................................................................................................................................... 21
5.3 Bias/ variance trade-off .....................................................................................................................................................22
LECTURE 4 – ASSESSING AND VISUALIZING MODEL PERFORMANCE ................................................................ 23
1. EVALUATING CLASSIFIERS ............................................................................................................................................................23
1.1 Accuracy .............................................................................................................................................................................. 23
1.2 Confusion matrix ................................................................................................................................................................ 23
1.3 Problems: unbalanced classes ............................................................................................................................................ 23
1.4 Problems: unequal costs and benefits .............................................................................................................................. 24
2. EXPECTED VALUE ....................................................................................................................................................................... 24
2.1 Expected Value for classifier evaluation .......................................................................................................................... 24
3. EVALUATION AND BASELINE PERFORMANCE .............................................................................................................................. 26
LECTURE 5 - LEVERAGING DATA SCIENCE: BUSINESS INSIGHTS, MODEL PERFORMANCE, AND EVIDENCE-
BASED DECISION MAKING .................................................................................................................................... 28
1 DECISION ANALYTIC THINKING I: WHAT IS A GOOD MODEL? (RECAP) ...................................................................................... 28
1.1 What is a Good Model? ...................................................................................................................................................... 28
1.2 Key concepts ...................................................................................................................................................................... 28
1.2.1 Positive and negatives ..................................................................................................................................................... 28
1.2.2 Accuracy .......................................................................................................................................................................... 28
1.2.3 The confusion matrix...................................................................................................................................................... 29
1.2.4 Cost- Benefit matrix ....................................................................................................................................................... 29
1.2.5 Expected Profit/ Value .................................................................................................................................................... 29
1.2.6 What is a good baseline?................................................................................................................................................ 29
2 VISUALIZING MODEL PERFORMANCE ........................................................................................................................................... 29
2.1 Ranking Classifier .............................................................................................................................................................. 30
2.2 Profit Curve........................................................................................................................................................................ 30

PAGINA 1

, 2.3 ROC Curve ......................................................................................................................................................................... 30
2.4 AUC (Area Under Curve)................................................................................................................................................... 31
2.5 Lift Curve (Cumulative Response Curve) ......................................................................................................................... 31
3 EVIDENCE AND PROBABILITIES.................................................................................................................................................... 34
3.1 Evidence .............................................................................................................................................................................. 34
3.2 Joint Probabilities and Independence............................................................................................................................... 34
3.3 Bayes Rule .......................................................................................................................................................................... 34
3.4 Naive Bayes and Conditional Independence ................................................................................................................... 34
3.5 Evidence Lift .......................................................................................................................................................................35
LECTURE 6 – SIMILARITY, NEIGHBORS, AND CLUSTERS .................................................................................... 37
1. SIMILARITY (BETWEEN INSTANCES) .............................................................................................................................................37
2. DISTANCE AND SIMILARITY MEASURES .......................................................................................................................................37
2.1 Euclidean distance ..............................................................................................................................................................37
2.2 Manhattan distance ...........................................................................................................................................................37
2.3 Cosine similarity ................................................................................................................................................................37
2.4 Jaccard similarity .............................................................................................................................................................. 38
2.5 Hamming distance ............................................................................................................................................................ 38
2.6 Levenshtein distance ........................................................................................................................................................ 38
3. K-NEAREST NEIGHBORS ............................................................................................................................................................. 38
4. HIERARCHICAL CLUSTERING (DENDROGRAMS) ......................................................................................................................... 40
5. K-MEANS CLUSTERING .............................................................................................................................................................. 40
LECTURE 7 – RECOMMENDER SYSTEMS ............................................................................................................... 41
1. WHAT IS A RECOMMENDER SYSTEM? ........................................................................................................................................... 41
2. PROBLEM DEFINITION: EVALUATING A RECOMMENDATION ALGORITHM .................................................................................. 41
3. RECOMMENDATION ALGORITHMS: TWO PERSPECTIVES ........................................................................................................... 42
3.1 Baselines, Content-Based, Collaborative Filtering and Hybrid Algorithms (Data perspective) .............................. 42
3.1.1 Baseline ............................................................................................................................................................................ 42
3.1.2 Content-based ................................................................................................................................................................. 42
3.1.3 Collaborative filtering ..................................................................................................................................................... 43
3.2 Pointwise, Pairwise and Listwise Learning-to-Rank (Learning perspective) ............................................................ 44
3.2.1 Pointwise: learning-to-rank ........................................................................................................................................... 44
3.2.2 Pairwise: learning-to-rank ............................................................................................................................................. 44
3.2.3 Listwise: learning-to-rank.............................................................................................................................................. 44
4. UNDER THE HOOD: BUILDING A PERSONALIZED RECOMMENDER SYSTEM ................................................................................ 44
LECTURE 8 – TEXT MINING .................................................................................................................................... 45
1. TEXT MINING APPLICATIONS ..................................................................................................................................................... 45
1.1 Unstructured vs. Structured Data ..................................................................................................................................... 45
1.2 Text Preprocessing............................................................................................................................................................. 45
1.3 Terminology: Documents, Tokens and Terms, Corpus ................................................................................................... 46
1.4 Bag of Words ..................................................................................................................................................................... 46
1.5 TF-IDF (Term Frequency - Inverse Document Frequency) ............................................................................................. 47
1.6 N-gram ............................................................................................................................................................................... 48
1.7 Named Entity Recognition ................................................................................................................................................ 48
1.8 Topic Model ....................................................................................................................................................................... 48
1.9 Word Embedding ............................................................................................................................................................... 49
2. ASSOCIATION RULE MINING...................................................................................................................................................... 49
2.1 Item sets ............................................................................................................................................................................. 49
2.2 Frequent Item sets............................................................................................................................................................. 49
2.3 Association Rules .............................................................................................................................................................. 50
2.4 Support .............................................................................................................................................................................. 50
2.5 Confidence ......................................................................................................................................................................... 50
2.6 Association Rule Mining: Apriori Algorithm .................................................................................................................. 50
2.7 Lift .......................................................................................................................................................................................52

PAGINA 2

,LECTURE 9 – NEURAL NETWORKS AND DEEP LEARNING ................................................................................... 53
1. NEURAL NETWORKS....................................................................................................................................................................53
1.1 The Perceptron ....................................................................................................................................................................53
1.2 Activation Function ............................................................................................................................................................53
1.3 Multi-Layer Perceptron ..................................................................................................................................................... 54
1.4 Forward Pass .......................................................................................................................................................................55
1.5 Loss Function ..................................................................................................................................................................... 56
1.6 Backpropagation (Backward Pass) .................................................................................................................................. 56
1.7 Gradient Descent Algorithm ............................................................................................................................................. 56
1.8 Stochastic Gradient Descent ............................................................................................................................................ 56
2. DEEP LEARNING......................................................................................................................................................................... 57
2.1 Convolutional Neural Networks (CNNs) ..........................................................................................................................57
2.2 Recurrent Neural Networks (RNNs) ................................................................................................................................ 58
2.3 Autoencoders ..................................................................................................................................................................... 58
2.4 Transformers ..................................................................................................................................................................... 59
2.5 Foundation Models (Large Language Models) ............................................................................................................... 59
LECTURE 10 – ENSEMBLE METHODS SVM ............................................................................................................. 61
1. ENSEMBLE METHODS ................................................................................................................................................................... 61
1.1 Combine by consensus ........................................................................................................................................................ 61
1.1.1 Bagging .............................................................................................................................................................................. 61
1.1.2 Random Forests ............................................................................................................................................................... 62
1.2 Combine by learning .......................................................................................................................................................... 62
1.2.1 Boosting ........................................................................................................................................................................... 62
1.2.2 Stacking ........................................................................................................................................................................... 63
2. A BRIEF INTRO TO THE SUPPORT VECTOR MACHINE ................................................................................................................. 64
3. DATA SCIENCE ETHICS ............................................................................................................................................................... 66
3.1 Data gathering: Privacy, A/B Testing and Bias ............................................................................................................... 67
3.1.1 Privacy .............................................................................................................................................................................. 67
3.1.2 Experimentation ............................................................................................................................................................. 67
3.1.3 Bias................................................................................................................................................................................... 68
3.2 Data preprocessing: Proxies, Government Backdoors ................................................................................................... 68
3.3 Modeling: ZK Proofs, Discrimination .............................................................................................................................. 69
3.4 Model evaluation: explain ................................................................................................................................................ 69
3.5 Deployment: Unintended consequences .......................................................................................................................... 70

PAGINA 3

,Lecture 1 – What is Data Science and Machine Learning?
What is machine learning? The automatic extraction of patterns from large amounts of data (done by machines).
Most of what people call AI = machine learning

1 INTRODUCTION
What will be the goals of a data-driven prediction? To find non- obvious patterns
= we use the patterns to improve our business (ex. To offer more product so they won’t be sold out)

Machine learning: automatic extraction of knowledge from data
 Setting the scene with credit scoring example
o Banks: should I grand credit to this loan applicant?
o Predict the creditworthiness, based on historical data
 Data → Machine learning technique → pattern
BUT note: an initial set of data instances with known target variable needed otherwise we can’t make a
predicted model!

Data instance xi
- A vector of size m (number of input variables)
- I = 1, 2, … ,m (number of data instances)
-

2 TERMINOLOGY
Machine learning: automatic extraction of patterns from data
Data science: a set of fundamental principles that guide the extraction of
knowledge from data
AI: methods for improving the knowledge or performance of an intelligent agent
over time, in response to the agent's experience in the world
Big data: data that is so large that traditional data processing systems are unable
to deal with it (both storage and analysis component).
= only a few companies have this (ex. twitter)

Querying and reporting (=displaying the data)
- You know exactly what you are looking for – targeted data extraction
- SQL – structured query language
Querying and Visualization (= graphical representation of data)
- Multidimensional analysis – exploring data across multiple dimensions (attributes and factors) to
uncover relationships, patterns and trends

PAGINA 4

, OLAP – Online analytical processing = a technology and methodology used to perform multidimensional analysis of
large datasets quickly and interactively.
! The end-user is still the engine of discovery

Business intelligence – getting the right information to the right person at the right time
It always includes these points :
Data is collected in a data warehouse. Tries to get all the data in
one place
 Reporting uses data from the warehouse to generate visual
dashboards, tables, and summaries for business insights.
 Machine learning uses the data in the warehouse as input for
training predictive models and uncovering patterns.

2.1 Artificial Intelligence
• A computer interacts through data
• Learning from data leads to intelligence
• Big Data + Machine Learning = Artificial Intelligence (Theodoros Evgeniou, 2019)
• Renewed interest from Deep Learning (large artificial neural networks)
• Most work in AI is on Machine Learning
• The automatic extraction of patterns from large amounts of data”
 The separation between the fields has blurred

General definition: AI system leas a machine-based system that is designed to operate with varying levels of
autonomy and that may exhibit adaptiveness after deployment, and that, for explicit or implicit objectives, infers,
from the input it receives, how to generate outputs such as decisions, recommendations, predictions, classifications,
actions, or content creation (e.g., text, images, videos, or audio) to fulfill the intended purpose.
= The automatic extraction of patterns from large amounts of data

2.2 Machine Learning
Note: an initial set of data instances with known target variable
needed!!

Example Facebook likes predict personality traits
We can investigate the variables (likes) with highest and lowest
coefficient in the linear model.

Example. Default prediction with Facebook data for micro-finance
▪ In collaboration with NY-based LenddoEFL
▪ Mirco-finance in developing countries
▪ 1000 – 5000 $ loans
▪ Most citizens limited/no credit history

PAGINA 5

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper josefienj03. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,96. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 64450 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Verkoper

Samenvatting

Summary Machine Learning for business

Document informatie

Onderwerpen

Gekoppeld boek

Meer samenvattingen voor studieboek

Geschreven voor

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?