Machine learning
BUSINESS ENGINEERING – FINANCIAL
ENGINEERING
2024-2025
Data Science Challenge: 5
Exam: 15
- Multiple choice
➢ No guessing correction
➢ Censure: if 5 answers per question, you need to answer 60% correct to pass the multiple
choice part
➢ So answer all questions
➢ Details on the exam
- Open question
,Inhoudsopgave
LECTURE 1 – WHAT IS DATA SCIENCE AND MACHINE LEARNING? ....................................................................... 4
1 INTRODUCTION ............................................................................................................................................................................. 4
2 TERMINOLOGY.............................................................................................................................................................................. 4
2.1 Artificial Intelligence ........................................................................................................................................................... 5
2.2 Machine Learning ............................................................................................................................................................... 5
3 DATA SCIENTIST ........................................................................................................................................................................... 7
3.1 Importance ........................................................................................................................................................................... 7
3.2 Roles and skills .................................................................................................................................................................... 7
LECTURE 2 – MACHINE LEARNING .......................................................................................................................... 8
1 EXPLAINING VERSUS PREDICTING MODELLING ............................................................................................................................... 8
2 DATA PREPROCESSING ................................................................................................................................................................. 10
2.1 Sampling .............................................................................................................................................................................. 10
2.2 Encoding .............................................................................................................................................................................. 11
2.3 Missing values .................................................................................................................................................................... 12
2.4 Outliers ............................................................................................................................................................................... 12
2.5 Normalizing ........................................................................................................................................................................ 13
2.6 Discretization .................................................................................................................................................................... 14
3 SOME NOTES ABOUT CHATGPT ................................................................................................................................................... 14
LECTURE 3 – INTRODUCTION TO PREDICTIVE MODELING ............................................................................... 16
1. TERMINOLOGY ............................................................................................................................................................................ 16
2. FINDING INFORMATIVE VARIABLES FROM THE DATA ................................................................................................................... 17
3. DECISION TREES.......................................................................................................................................................................... 17
4. METHOLOGY OF DECISION TREES IN MORE DETAIL ..................................................................................................................... 18
5. OVERFITTING AND ITS AVOIDANCE ........................................................................................................................................... 20
5.1 Overfitting .......................................................................................................................................................................... 20
5.2 Avoidance ........................................................................................................................................................................... 21
5.3 Bias/ variance trade-off .....................................................................................................................................................22
LECTURE 4 – ASSESSING AND VISUALIZING MODEL PERFORMANCE ................................................................ 23
1. EVALUATING CLASSIFIERS ............................................................................................................................................................23
1.1 Accuracy .............................................................................................................................................................................. 23
1.2 Confusion matrix ................................................................................................................................................................ 23
1.3 Problems: unbalanced classes ............................................................................................................................................ 23
1.4 Problems: unequal costs and benefits .............................................................................................................................. 24
2. EXPECTED VALUE ....................................................................................................................................................................... 24
2.1 Expected Value for classifier evaluation .......................................................................................................................... 24
3. EVALUATION AND BASELINE PERFORMANCE .............................................................................................................................. 26
LECTURE 5 - LEVERAGING DATA SCIENCE: BUSINESS INSIGHTS, MODEL PERFORMANCE, AND EVIDENCE-
BASED DECISION MAKING .................................................................................................................................... 28
1 DECISION ANALYTIC THINKING I: WHAT IS A GOOD MODEL? (RECAP) ...................................................................................... 28
1.1 What is a Good Model? ...................................................................................................................................................... 28
1.2 Key concepts ...................................................................................................................................................................... 28
1.2.1 Positive and negatives ..................................................................................................................................................... 28
1.2.2 Accuracy .......................................................................................................................................................................... 28
1.2.3 The confusion matrix...................................................................................................................................................... 29
1.2.4 Cost- Benefit matrix ....................................................................................................................................................... 29
1.2.5 Expected Profit/ Value .................................................................................................................................................... 29
1.2.6 What is a good baseline?................................................................................................................................................ 29
2 VISUALIZING MODEL PERFORMANCE ........................................................................................................................................... 29
2.1 Ranking Classifier .............................................................................................................................................................. 30
2.2 Profit Curve........................................................................................................................................................................ 30
PAGINA 1
, 2.3 ROC Curve ......................................................................................................................................................................... 30
2.4 AUC (Area Under Curve)................................................................................................................................................... 31
2.5 Lift Curve (Cumulative Response Curve) ......................................................................................................................... 31
3 EVIDENCE AND PROBABILITIES.................................................................................................................................................... 34
3.1 Evidence .............................................................................................................................................................................. 34
3.2 Joint Probabilities and Independence............................................................................................................................... 34
3.3 Bayes Rule .......................................................................................................................................................................... 34
3.4 Naive Bayes and Conditional Independence ................................................................................................................... 34
3.5 Evidence Lift .......................................................................................................................................................................35
LECTURE 6 – SIMILARITY, NEIGHBORS, AND CLUSTERS .................................................................................... 37
1. SIMILARITY (BETWEEN INSTANCES) .............................................................................................................................................37
2. DISTANCE AND SIMILARITY MEASURES .......................................................................................................................................37
2.1 Euclidean distance ..............................................................................................................................................................37
2.2 Manhattan distance ...........................................................................................................................................................37
2.3 Cosine similarity ................................................................................................................................................................37
2.4 Jaccard similarity .............................................................................................................................................................. 38
2.5 Hamming distance ............................................................................................................................................................ 38
2.6 Levenshtein distance ........................................................................................................................................................ 38
3. K-NEAREST NEIGHBORS ............................................................................................................................................................. 38
4. HIERARCHICAL CLUSTERING (DENDROGRAMS) ......................................................................................................................... 40
5. K-MEANS CLUSTERING .............................................................................................................................................................. 40
LECTURE 7 – RECOMMENDER SYSTEMS ............................................................................................................... 41
1. WHAT IS A RECOMMENDER SYSTEM? ........................................................................................................................................... 41
2. PROBLEM DEFINITION: EVALUATING A RECOMMENDATION ALGORITHM .................................................................................. 41
3. RECOMMENDATION ALGORITHMS: TWO PERSPECTIVES ........................................................................................................... 42
3.1 Baselines, Content-Based, Collaborative Filtering and Hybrid Algorithms (Data perspective) .............................. 42
3.1.1 Baseline ............................................................................................................................................................................ 42
3.1.2 Content-based ................................................................................................................................................................. 42
3.1.3 Collaborative filtering ..................................................................................................................................................... 43
3.2 Pointwise, Pairwise and Listwise Learning-to-Rank (Learning perspective) ............................................................ 44
3.2.1 Pointwise: learning-to-rank ........................................................................................................................................... 44
3.2.2 Pairwise: learning-to-rank ............................................................................................................................................. 44
3.2.3 Listwise: learning-to-rank.............................................................................................................................................. 44
4. UNDER THE HOOD: BUILDING A PERSONALIZED RECOMMENDER SYSTEM ................................................................................ 44
LECTURE 8 – TEXT MINING .................................................................................................................................... 45
1. TEXT MINING APPLICATIONS ..................................................................................................................................................... 45
1.1 Unstructured vs. Structured Data ..................................................................................................................................... 45
1.2 Text Preprocessing............................................................................................................................................................. 45
1.3 Terminology: Documents, Tokens and Terms, Corpus ................................................................................................... 46
1.4 Bag of Words ..................................................................................................................................................................... 46
1.5 TF-IDF (Term Frequency - Inverse Document Frequency) ............................................................................................. 47
1.6 N-gram ............................................................................................................................................................................... 48
1.7 Named Entity Recognition ................................................................................................................................................ 48
1.8 Topic Model ....................................................................................................................................................................... 48
1.9 Word Embedding ............................................................................................................................................................... 49
2. ASSOCIATION RULE MINING...................................................................................................................................................... 49
2.1 Item sets ............................................................................................................................................................................. 49
2.2 Frequent Item sets............................................................................................................................................................. 49
2.3 Association Rules .............................................................................................................................................................. 50
2.4 Support .............................................................................................................................................................................. 50
2.5 Confidence ......................................................................................................................................................................... 50
2.6 Association Rule Mining: Apriori Algorithm .................................................................................................................. 50
2.7 Lift .......................................................................................................................................................................................52
PAGINA 2
, LECTURE 9 – NEURAL NETWORKS AND DEEP LEARNING ................................................................................... 53
1. NEURAL NETWORKS....................................................................................................................................................................53
1.1 The Perceptron ....................................................................................................................................................................53
1.2 Activation Function ............................................................................................................................................................53
1.3 Multi-Layer Perceptron ..................................................................................................................................................... 54
1.4 Forward Pass .......................................................................................................................................................................55
1.5 Loss Function ..................................................................................................................................................................... 56
1.6 Backpropagation (Backward Pass) .................................................................................................................................. 56
1.7 Gradient Descent Algorithm ............................................................................................................................................. 56
1.8 Stochastic Gradient Descent ............................................................................................................................................ 56
2. DEEP LEARNING......................................................................................................................................................................... 57
2.1 Convolutional Neural Networks (CNNs) ..........................................................................................................................57
2.2 Recurrent Neural Networks (RNNs) ................................................................................................................................ 58
2.3 Autoencoders ..................................................................................................................................................................... 58
2.4 Transformers ..................................................................................................................................................................... 59
2.5 Foundation Models (Large Language Models) ............................................................................................................... 59
LECTURE 10 – ENSEMBLE METHODS SVM ............................................................................................................. 61
1. ENSEMBLE METHODS ................................................................................................................................................................... 61
1.1 Combine by consensus ........................................................................................................................................................ 61
1.1.1 Bagging .............................................................................................................................................................................. 61
1.1.2 Random Forests ............................................................................................................................................................... 62
1.2 Combine by learning .......................................................................................................................................................... 62
1.2.1 Boosting ........................................................................................................................................................................... 62
1.2.2 Stacking ........................................................................................................................................................................... 63
2. A BRIEF INTRO TO THE SUPPORT VECTOR MACHINE ................................................................................................................. 64
3. DATA SCIENCE ETHICS ............................................................................................................................................................... 66
3.1 Data gathering: Privacy, A/B Testing and Bias ............................................................................................................... 67
3.1.1 Privacy .............................................................................................................................................................................. 67
3.1.2 Experimentation ............................................................................................................................................................. 67
3.1.3 Bias................................................................................................................................................................................... 68
3.2 Data preprocessing: Proxies, Government Backdoors ................................................................................................... 68
3.3 Modeling: ZK Proofs, Discrimination .............................................................................................................................. 69
3.4 Model evaluation: explain ................................................................................................................................................ 69
3.5 Deployment: Unintended consequences .......................................................................................................................... 70
PAGINA 3
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller josefienj03. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.39. You're not tied to anything after your purchase.