Very useful summary for the Machine Learning course at Tilburg University (Data Science and Society). Summary contains all parts that are discussed in the course (2019) extra examples, explanation and notes. Summary also contains many useful visualizations.
What is the difference between Multiclass classification and Multilabel classification?
Answer: Multiclass classification makes the assumption that each sample is assigned to ONE label. Multilabel classification assigns to each sample a set of target labels.
2.
What is the Mean absolute error? (MAE)
Answer: The average (absolute) difference between true value and predicted value
3.
What is the disadvantage of using the \'error rate\' for classification problems?
Answer: The different types of errors are not taken into account
4.
What is the difference between precision and recall?
Answer: Precision refers to the fraction of relevant instances among the retrieved instances. Recall refers to the fraction of the total amount of relevant instances that were actually retrieved.
5.
What is the F-score?
Answer: The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0.
6.
What is the difference between \'micro-average\' and \'macro-average\'?
Answer: A macro-average will compute the metric independently for each class and then take the average (hence treating all classes equally), whereas a micro-average will aggregate the contributions of all classes to compute the average metric.
7.
True or False? The number of possible decision trees grows
exponentially with number of features.
Answer: True
8.
What is the relationship between \'the base case\' and \'the recursive call\'? (decision trees)
Answer: The base case returns a value without making any subsequent recursive calls (= leaf node). If the base case is false, you will enter the recursive condition, which is the case when the function calls itself.
9.
What is \'Entropy\'?
Answer: Entropy is a measure of uncertainty, where more uniform distributions have more uncertainty.
10.
What is Gini impurity?
Answer: How often a random element would be labelled incorrectly if labels were assigned at random from a given distribution. It reaches its minimum (zero) when all cases in the node fall into a single target category.
Content preview
Machine Learning Tilburg University 2019-2020
MACHINE LEARNING
LECTURE 1: EVALUATION
Machine Learning: Machine learning studies algorithms which can learn to solve problems from
examples. It enables automate problem solving, like flagging spam in your e-mail or recognizing faces
in photos.
Rules: You need rules to solve these problems. Rules are learned from our examples, but we must
provide the format for these rules (decision tree or linear model for example).
• Example: If (A or B or C) and not D, then SPAM
Learning from examples (Machine Learning Approach):
• Find examples of SPAM and non-SPAM
• Come up with a learning algorithm
• A learning algorithm infers rules from examples
• These rules can then be applied to new data (e-mails in this specific case)
Learning Algorithms: Machine learning uses programmed algorithms that receive and analyze input
data to predict output values. As new data is fed to these algorithms, they learn and optimize their
operations to improve performance, developing 'intelligence' over time.
Types of Machine Learning problems:
• Regression: response will be a (real) number
o Predict person’s age, price of stock, score on an exam
• Binary classification: response will be Yes/No answer (classifying instances into one of two
classes)
o SPAM detection (yes / no), polarity of product review (positive / negative), predict
gender (male / female)
o Each sample is assigned to only ONE label
• Multiclass classification: response will be one of a finite set of options (classification task with
more than two classes, but each sample is assigned to only ONE label)
o Classify articles as politics, science, health, finance, sports etc.
• Multilabel classification: a finite set of Yes/No answers. There is no constraint on how many
of the classes the instance can be assigned.
o Assign songs to one or more genres. Each song may belong to several genres
simultaneously.
▪ Examples: {rock - pop - metal} OR {jazz - blues} OR {rock - punk}
• Ranking: Order objects according to relevance (ranking models for information retrieval
systems)
o Rank web pages in response to user query (like using Google to find information)
• Sequence Labeling: the algorithmic assignment of a categorical label to each member of a
sequence
o Input: sequence of elements (e.g. words)
o Response: a corresponding sequence of labels
1
,Machine Learning Tilburg University 2019-2020
▪Label words in a sentence with their syntactic category (determiner, verb,
noun etc.)
▪ Label frames in speech signal with corresponding phonemes
• Sequence-to-sequence modeling aims to map a fixed length input with a fixed length
o Input: a sequence of elements
o Response: another sequence of elements
▪ Possibly different length
▪ Possibly element from different sets
o Examples: translate between languages or summarizing texts
How to evaluate the success of our machine learning solutions: It depends on the type of problem
• Regression: Mean Absolute Error (average (absolute) difference between true value and
predicted value) or the Mean Squared Error (average square of the difference between true
value and predicted value)
• Classification: Error rate (the proportion of mistakes)
o Disadvantage: kind of mistake not considered and risk of disbalanced classes
Kinds of mistakes:
• False positive – Flagged as SPAM, but non-SPAM
• False negative – Not flagged, but is SPAM
• False positives are usually bigger problems!!!
Precision and Recall: Metrics which focus on one kind of mistake
• Precision – What fraction of flagged emails were real SPAMs?
• Recall – What fraction of real SPAMs were flagged?
F-score: Harmonic mean between precision and recall. It considers both the precision
(p) and the recall (r) of the test to compute the score. F-score reaches its best value at
1 (perfect precision and recall) and worst at 0.
2
,Machine Learning Tilburg University 2019-2020
Two other commonly used F measures are the F2 measure, which weighs recall higher than precision
(by placing more emphasis on false negatives), and the F0.5 measure, which weighs recall lower than
precision (by attenuating the influence of false negatives). → Fβ-score: Parameter β quantifies how
much more we care about recall than precision.
Macro-average: Compute precision and recall per-class, and average:
• Rare classes have the same impact as frequent ones. So, you must look at the scores per
class as well!
o Macro-average precision with two different sets = (P1+P2)/2
o Macro-average recall with two different sets = (R1+R2)/2
Micro-average: Only useful if we treat one class as a default class
• Treat each correct prediction as TP
• Treat each missing classification as FN
• Treat each incorrect prediction as FP
Micro-average of precision with two different sets = (TP1+TP2)/(TP1+TP2+FP1+FP2)
Micro-average of recall with two different sets = (TP1+TP2)/(TP1+TP2+FN1+FN2)
Using examples: Always use disjoint sets of examples (training set, development set, and test set)
• Always use the same evaluation metrics for development set and test set
• Important for evaluation to be close to true (real world) objective
LECTURE 2: DECISION TREES AND FORESTS
Learning Rules: nested if-then-else rules
Classification: The problem of identifying to which of a set of categories a new observation belongs.
Example: when classifying fruit, looking at the shape and color to figure out what type of fruit it is.
Decision tree: Model that predicts the value of a target variable based on several input variables.
Decision trees classify instances by sorting them down the tree from the root to some leaf node,
which provides the classification of the instance. Example:
3
,Machine Learning Tilburg University 2019-2020
• The very top of the three is
called the Root Node
• Internal Nodes have arrows
pointing to them and away
from them
• Leaf Nodes have arrows
pointing to them, but NOT
away from them (The last
node in a path)
Using a decision tree:
If NO Prediction apply algo with
left branch
If YES Prediction apply algo with
right branch
Building a decision tree: A decision tree is drawn upside down with its root at the top. The number
of possible trees grows exponentially with number of features.
• You can’t check them all and see which one works best
• So, we need to build a tree incrementally. Thus, selecting one particular node and consider it
as fixed if it is useful, and then continue with another node.
Which question to ask first?
• It is best to ask important questions first
• This is the one which helps us classify the best → If we had to classify data based on only
ONE question, which question would do best?
Step by step decision tree building:
1. Choose one question you could ask
2. Check how many correct answers you get by classifying based on this particular question only
3. Do this for each possible question
4. The first question in the model (decision tree) is the question that produced the most correct
answers
5. Go to next level of the tree and repeat this procedure (excl. the question you already asked)
Building a decision tree:
• If all examples have same label → Create leaf node with label
Otherwise:
• Choose most important question
• Split data into two parts (NO and YES) according to question
• Remove question from question set
• Left branch Apply algo to ← NO examples
• Right branch Apply algo to ← YES examples
• Create node with (question, left branch, right branch)
4
, Machine Learning Tilburg University 2019-2020
Digression: Recursion
• We build and use decision trees with recursive functions
• Recursive function: Function that calls itself during its execution. This enables the function to
repeat itself several times, outputting the result and the end of each iteration
• A recursive function is a function that calls itself until a “base condition” is true, and
execution stops
Trees: Recursively defined data structures
• Base case = Leaf node (Returns a value without making any following recursive calls)
• Recursive case = If the base case is false, enter the recursive condition (Case when the
function calls itself) Branch node
Decision tree efficiency: Speed depends on number of questions needed to get to a leaf node. This
depends on the depth of the tree. The depth of a decision tree is the length of the longest path from
a root to a leaf ( count the number of branches!)
• Unbalanced trees: The left and right subtree have different number of nodes and depth
• Balanced trees: The left and right subtree have equal number of nodes and depth
Depth of balanced tree: In a balanced binary tree, each time you ask a question, you halve the
number of remaining questions → Example (N = 8, depth = 3) → (= Log (8) / Log (2) == 3)
5
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller dc070498. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $4.89. You're not tied to anything after your purchase.