Machine Learning Tilburg University 2019-2020
MACHINE LEARNING
LECTURE 1: EVALUATION
Machine Learning: Machine learning studies algorithms which can learn to solve problems from
examples. It enables automate problem solving, like flagging spam in your e-mail or recognizing faces
in photos.
Rules: You need rules to solve these problems. Rules are learned from our examples, but we must
provide the format for these rules (decision tree or linear model for example).
• Example: If (A or B or C) and not D, then SPAM
Learning from examples (Machine Learning Approach):
• Find examples of SPAM and non-SPAM
• Come up with a learning algorithm
• A learning algorithm infers rules from examples
• These rules can then be applied to new data (e-mails in this specific case)
Learning Algorithms: Machine learning uses programmed algorithms that receive and analyze input
data to predict output values. As new data is fed to these algorithms, they learn and optimize their
operations to improve performance, developing 'intelligence' over time.
Types of Machine Learning problems:
• Regression: response will be a (real) number
o Predict person’s age, price of stock, score on an exam
• Binary classification: response will be Yes/No answer (classifying instances into one of two
classes)
o SPAM detection (yes / no), polarity of product review (positive / negative), predict
gender (male / female)
o Each sample is assigned to only ONE label
• Multiclass classification: response will be one of a finite set of options (classification task with
more than two classes, but each sample is assigned to only ONE label)
o Classify articles as politics, science, health, finance, sports etc.
• Multilabel classification: a finite set of Yes/No answers. There is no constraint on how many
of the classes the instance can be assigned.
o Assign songs to one or more genres. Each song may belong to several genres
simultaneously.
▪ Examples: {rock - pop - metal} OR {jazz - blues} OR {rock - punk}
• Ranking: Order objects according to relevance (ranking models for information retrieval
systems)
o Rank web pages in response to user query (like using Google to find information)
• Sequence Labeling: the algorithmic assignment of a categorical label to each member of a
sequence
o Input: sequence of elements (e.g. words)
o Response: a corresponding sequence of labels
1
,Machine Learning Tilburg University 2019-2020
▪Label words in a sentence with their syntactic category (determiner, verb,
noun etc.)
▪ Label frames in speech signal with corresponding phonemes
• Sequence-to-sequence modeling aims to map a fixed length input with a fixed length
o Input: a sequence of elements
o Response: another sequence of elements
▪ Possibly different length
▪ Possibly element from different sets
o Examples: translate between languages or summarizing texts
How to evaluate the success of our machine learning solutions: It depends on the type of problem
• Regression: Mean Absolute Error (average (absolute) difference between true value and
predicted value) or the Mean Squared Error (average square of the difference between true
value and predicted value)
• Classification: Error rate (the proportion of mistakes)
o Disadvantage: kind of mistake not considered and risk of disbalanced classes
Kinds of mistakes:
• False positive – Flagged as SPAM, but non-SPAM
• False negative – Not flagged, but is SPAM
• False positives are usually bigger problems!!!
Precision and Recall: Metrics which focus on one kind of mistake
• Precision – What fraction of flagged emails were real SPAMs?
• Recall – What fraction of real SPAMs were flagged?
F-score: Harmonic mean between precision and recall. It considers both the precision
(p) and the recall (r) of the test to compute the score. F-score reaches its best value at
1 (perfect precision and recall) and worst at 0.
2
,Machine Learning Tilburg University 2019-2020
Two other commonly used F measures are the F2 measure, which weighs recall higher than precision
(by placing more emphasis on false negatives), and the F0.5 measure, which weighs recall lower than
precision (by attenuating the influence of false negatives). → Fβ-score: Parameter β quantifies how
much more we care about recall than precision.
Macro-average: Compute precision and recall per-class, and average:
• Rare classes have the same impact as frequent ones. So, you must look at the scores per
class as well!
o Macro-average precision with two different sets = (P1+P2)/2
o Macro-average recall with two different sets = (R1+R2)/2
Micro-average: Only useful if we treat one class as a default class
• Treat each correct prediction as TP
• Treat each missing classification as FN
• Treat each incorrect prediction as FP
Micro-average of precision with two different sets = (TP1+TP2)/(TP1+TP2+FP1+FP2)
Micro-average of recall with two different sets = (TP1+TP2)/(TP1+TP2+FN1+FN2)
Using examples: Always use disjoint sets of examples (training set, development set, and test set)
• Always use the same evaluation metrics for development set and test set
• Important for evaluation to be close to true (real world) objective
LECTURE 2: DECISION TREES AND FORESTS
Learning Rules: nested if-then-else rules
Classification: The problem of identifying to which of a set of categories a new observation belongs.
Example: when classifying fruit, looking at the shape and color to figure out what type of fruit it is.
Decision tree: Model that predicts the value of a target variable based on several input variables.
Decision trees classify instances by sorting them down the tree from the root to some leaf node,
which provides the classification of the instance. Example:
3
,Machine Learning Tilburg University 2019-2020
• The very top of the three is
called the Root Node
• Internal Nodes have arrows
pointing to them and away
from them
• Leaf Nodes have arrows
pointing to them, but NOT
away from them (The last
node in a path)
Using a decision tree:
If NO Prediction apply algo with
left branch
If YES Prediction apply algo with
right branch
Building a decision tree: A decision tree is drawn upside down with its root at the top. The number
of possible trees grows exponentially with number of features.
• You can’t check them all and see which one works best
• So, we need to build a tree incrementally. Thus, selecting one particular node and consider it
as fixed if it is useful, and then continue with another node.
Which question to ask first?
• It is best to ask important questions first
• This is the one which helps us classify the best → If we had to classify data based on only
ONE question, which question would do best?
Step by step decision tree building:
1. Choose one question you could ask
2. Check how many correct answers you get by classifying based on this particular question only
3. Do this for each possible question
4. The first question in the model (decision tree) is the question that produced the most correct
answers
5. Go to next level of the tree and repeat this procedure (excl. the question you already asked)
Building a decision tree:
• If all examples have same label → Create leaf node with label
Otherwise:
• Choose most important question
• Split data into two parts (NO and YES) according to question
• Remove question from question set
• Left branch Apply algo to ← NO examples
• Right branch Apply algo to ← YES examples
• Create node with (question, left branch, right branch)
4
, Machine Learning Tilburg University 2019-2020
Digression: Recursion
• We build and use decision trees with recursive functions
• Recursive function: Function that calls itself during its execution. This enables the function to
repeat itself several times, outputting the result and the end of each iteration
• A recursive function is a function that calls itself until a “base condition” is true, and
execution stops
Trees: Recursively defined data structures
• Base case = Leaf node (Returns a value without making any following recursive calls)
• Recursive case = If the base case is false, enter the recursive condition (Case when the
function calls itself) Branch node
Decision tree efficiency: Speed depends on number of questions needed to get to a leaf node. This
depends on the depth of the tree. The depth of a decision tree is the length of the longest path from
a root to a leaf ( count the number of branches!)
• Unbalanced trees: The left and right subtree have different number of nodes and depth
• Balanced trees: The left and right subtree have equal number of nodes and depth
Depth of balanced tree: In a balanced binary tree, each time you ask a question, you halve the
number of remaining questions → Example (N = 8, depth = 3) → (= Log (8) / Log (2) == 3)
5