Machine learning
BUSINESS ENGINEERING – FINANCIAL
ENGINEERING
2024-2025
Data Science Challenge: 5
Exam: 15
- Multiple choice
➢ No guessing correction
➢ Censure: if 5 answers per question, you need to answer 60% correct to pass the multiple
choice part
➢ So answer all questions
➢ Details on the exam
- Open question
,Inhoudsopgave
LECTURE 1 – WHAT IS DATA SCIENCE AND MACHINE LEARNING? ....................................................................... 4
1 INTRODUCTION ............................................................................................................................................................................. 4
2 TERMINOLOGY.............................................................................................................................................................................. 4
2.1 Artificial Intelligence ........................................................................................................................................................... 5
2.2 Machine Learning ............................................................................................................................................................... 5
3 DATA SCIENTIST ........................................................................................................................................................................... 7
3.1 Importance ........................................................................................................................................................................... 7
3.2 Roles and skills .................................................................................................................................................................... 7
LECTURE 2 – MACHINE LEARNING .......................................................................................................................... 8
1 EXPLAINING VERSUS PREDICTING MODELLING ............................................................................................................................... 8
2 DATA PREPROCESSING ................................................................................................................................................................. 10
2.1 Sampling .............................................................................................................................................................................. 10
2.2 Encoding .............................................................................................................................................................................. 11
2.3 Missing values .................................................................................................................................................................... 12
2.4 Outliers ............................................................................................................................................................................... 12
2.5 Normalizing ........................................................................................................................................................................ 13
2.6 Discretization .................................................................................................................................................................... 14
3 SOME NOTES ABOUT CHATGPT ................................................................................................................................................... 14
LECTURE 3 – INTRODUCTION TO PREDICTIVE MODELING ............................................................................... 16
1. TERMINOLOGY ............................................................................................................................................................................ 16
2. FINDING INFORMATIVE VARIABLES FROM THE DATA ................................................................................................................... 17
3. DECISION TREES.......................................................................................................................................................................... 17
4. METHOLOGY OF DECISION TREES IN MORE DETAIL ..................................................................................................................... 18
5. OVERFITTING AND ITS AVOIDANCE ........................................................................................................................................... 20
5.1 Overfitting .......................................................................................................................................................................... 20
5.2 Avoidance ........................................................................................................................................................................... 21
5.3 Bias/ variance trade-off .....................................................................................................................................................22
LECTURE 4 – ASSESSING AND VISUALIZING MODEL PERFORMANCE ................................................................ 23
1. EVALUATING CLASSIFIERS ............................................................................................................................................................23
1.1 Accuracy .............................................................................................................................................................................. 23
1.2 Confusion matrix ................................................................................................................................................................ 23
1.3 Problems: unbalanced classes ............................................................................................................................................ 23
1.4 Problems: unequal costs and benefits .............................................................................................................................. 24
2. EXPECTED VALUE ....................................................................................................................................................................... 24
2.1 Expected Value for classifier evaluation .......................................................................................................................... 24
3. EVALUATION AND BASELINE PERFORMANCE .............................................................................................................................. 26
LECTURE 5 - LEVERAGING DATA SCIENCE: BUSINESS INSIGHTS, MODEL PERFORMANCE, AND EVIDENCE-
BASED DECISION MAKING .................................................................................................................................... 28
1 DECISION ANALYTIC THINKING I: WHAT IS A GOOD MODEL? (RECAP) ...................................................................................... 28
1.1 What is a Good Model? ...................................................................................................................................................... 28
1.2 Key concepts ...................................................................................................................................................................... 28
1.2.1 Positive and negatives ..................................................................................................................................................... 28
1.2.2 Accuracy .......................................................................................................................................................................... 28
1.2.3 The confusion matrix...................................................................................................................................................... 29
1.2.4 Cost- Benefit matrix ....................................................................................................................................................... 29
1.2.5 Expected Profit/ Value .................................................................................................................................................... 29
1.2.6 What is a good baseline?................................................................................................................................................ 29
2 VISUALIZING MODEL PERFORMANCE ........................................................................................................................................... 29
2.1 Ranking Classifier .............................................................................................................................................................. 30
2.2 Profit Curve........................................................................................................................................................................ 30
PAGINA 1
, 2.3 ROC Curve ......................................................................................................................................................................... 30
2.4 AUC (Area Under Curve)................................................................................................................................................... 31
2.5 Lift Curve (Cumulative Response Curve) ......................................................................................................................... 31
3 EVIDENCE AND PROBABILITIES.................................................................................................................................................... 34
3.1 Evidence .............................................................................................................................................................................. 34
3.2 Joint Probabilities and Independence............................................................................................................................... 34
3.3 Bayes Rule .......................................................................................................................................................................... 34
3.4 Naive Bayes and Conditional Independence ................................................................................................................... 34
3.5 Evidence Lift .......................................................................................................................................................................35
LECTURE 6 – SIMILARITY, NEIGHBORS, AND CLUSTERS .................................................................................... 37
1. SIMILARITY (BETWEEN INSTANCES) .............................................................................................................................................37
2. DISTANCE AND SIMILARITY MEASURES .......................................................................................................................................37
2.1 Euclidean distance ..............................................................................................................................................................37
2.2 Manhattan distance ...........................................................................................................................................................37
2.3 Cosine similarity ................................................................................................................................................................37
2.4 Jaccard similarity .............................................................................................................................................................. 38
2.5 Hamming distance ............................................................................................................................................................ 38
2.6 Levenshtein distance ........................................................................................................................................................ 38
3. K-NEAREST NEIGHBORS ............................................................................................................................................................. 38
4. HIERARCHICAL CLUSTERING (DENDROGRAMS) ......................................................................................................................... 40
5. K-MEANS CLUSTERING .............................................................................................................................................................. 40
LECTURE 7 – RECOMMENDER SYSTEMS ............................................................................................................... 41
1. WHAT IS A RECOMMENDER SYSTEM? ........................................................................................................................................... 41
2. PROBLEM DEFINITION: EVALUATING A RECOMMENDATION ALGORITHM .................................................................................. 41
3. RECOMMENDATION ALGORITHMS: TWO PERSPECTIVES ........................................................................................................... 42
3.1 Baselines, Content-Based, Collaborative Filtering and Hybrid Algorithms (Data perspective) .............................. 42
3.1.1 Baseline ............................................................................................................................................................................ 42
3.1.2 Content-based ................................................................................................................................................................. 42
3.1.3 Collaborative filtering ..................................................................................................................................................... 43
3.2 Pointwise, Pairwise and Listwise Learning-to-Rank (Learning perspective) ............................................................ 44
3.2.1 Pointwise: learning-to-rank ........................................................................................................................................... 44
3.2.2 Pairwise: learning-to-rank ............................................................................................................................................. 44
3.2.3 Listwise: learning-to-rank.............................................................................................................................................. 44
4. UNDER THE HOOD: BUILDING A PERSONALIZED RECOMMENDER SYSTEM ................................................................................ 44
LECTURE 8 – TEXT MINING .................................................................................................................................... 45
1. TEXT MINING APPLICATIONS ..................................................................................................................................................... 45
1.1 Unstructured vs. Structured Data ..................................................................................................................................... 45
1.2 Text Preprocessing............................................................................................................................................................. 45
1.3 Terminology: Documents, Tokens and Terms, Corpus ................................................................................................... 46
1.4 Bag of Words ..................................................................................................................................................................... 46
1.5 TF-IDF (Term Frequency - Inverse Document Frequency) ............................................................................................. 47
1.6 N-gram ............................................................................................................................................................................... 48
1.7 Named Entity Recognition ................................................................................................................................................ 48
1.8 Topic Model ....................................................................................................................................................................... 48
1.9 Word Embedding ............................................................................................................................................................... 49
2. ASSOCIATION RULE MINING...................................................................................................................................................... 49
2.1 Item sets ............................................................................................................................................................................. 49
2.2 Frequent Item sets............................................................................................................................................................. 49
2.3 Association Rules .............................................................................................................................................................. 50
2.4 Support .............................................................................................................................................................................. 50
2.5 Confidence ......................................................................................................................................................................... 50
2.6 Association Rule Mining: Apriori Algorithm .................................................................................................................. 50
2.7 Lift .......................................................................................................................................................................................52
PAGINA 2
, LECTURE 9 – NEURAL NETWORKS AND DEEP LEARNING ................................................................................... 53
1. NEURAL NETWORKS....................................................................................................................................................................53
1.1 The Perceptron ....................................................................................................................................................................53
1.2 Activation Function ............................................................................................................................................................53
1.3 Multi-Layer Perceptron ..................................................................................................................................................... 54
1.4 Forward Pass .......................................................................................................................................................................55
1.5 Loss Function ..................................................................................................................................................................... 56
1.6 Backpropagation (Backward Pass) .................................................................................................................................. 56
1.7 Gradient Descent Algorithm ............................................................................................................................................. 56
1.8 Stochastic Gradient Descent ............................................................................................................................................ 56
2. DEEP LEARNING......................................................................................................................................................................... 57
2.1 Convolutional Neural Networks (CNNs) ..........................................................................................................................57
2.2 Recurrent Neural Networks (RNNs) ................................................................................................................................ 58
2.3 Autoencoders ..................................................................................................................................................................... 58
2.4 Transformers ..................................................................................................................................................................... 59
2.5 Foundation Models (Large Language Models) ............................................................................................................... 59
LECTURE 10 – ENSEMBLE METHODS SVM ............................................................................................................. 61
1. ENSEMBLE METHODS ................................................................................................................................................................... 61
1.1 Combine by consensus ........................................................................................................................................................ 61
1.1.1 Bagging .............................................................................................................................................................................. 61
1.1.2 Random Forests ............................................................................................................................................................... 62
1.2 Combine by learning .......................................................................................................................................................... 62
1.2.1 Boosting ........................................................................................................................................................................... 62
1.2.2 Stacking ........................................................................................................................................................................... 63
2. A BRIEF INTRO TO THE SUPPORT VECTOR MACHINE ................................................................................................................. 64
3. DATA SCIENCE ETHICS ............................................................................................................................................................... 66
3.1 Data gathering: Privacy, A/B Testing and Bias ............................................................................................................... 67
3.1.1 Privacy .............................................................................................................................................................................. 67
3.1.2 Experimentation ............................................................................................................................................................. 67
3.1.3 Bias................................................................................................................................................................................... 68
3.2 Data preprocessing: Proxies, Government Backdoors ................................................................................................... 68
3.3 Modeling: ZK Proofs, Discrimination .............................................................................................................................. 69
3.4 Model evaluation: explain ................................................................................................................................................ 69
3.5 Deployment: Unintended consequences .......................................................................................................................... 70
PAGINA 3
Les avantages d'acheter des résumés chez Stuvia:
Qualité garantie par les avis des clients
Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.
L’achat facile et rapide
Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.
Focus sur l’essentiel
Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.
Foire aux questions
Qu'est-ce que j'obtiens en achetant ce document ?
Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.
Garantie de remboursement : comment ça marche ?
Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.
Auprès de qui est-ce que j'achète ce résumé ?
Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur josefienj03. Stuvia facilite les paiements au vendeur.
Est-ce que j'aurai un abonnement?
Non, vous n'achetez ce résumé que pour €6,96. Vous n'êtes lié à rien après votre achat.