100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Machine Learning: Comprehensive Summary €4,49   In winkelwagen

Samenvatting

Machine Learning: Comprehensive Summary

5 beoordelingen
 314 keer bekeken  26 keer verkocht

Very useful summary for the Machine Learning course at Tilburg University (Data Science and Society). Summary contains all parts that are discussed in the course (2019) extra examples, explanation and notes. Summary also contains many useful visualizations.

Laatste update van het document: 4 jaar geleden

Voorbeeld 5 van de 24  pagina's

  • 5 december 2019
  • 14 december 2019
  • 24
  • 2019/2020
  • Samenvatting
Alle documenten voor dit vak (14)

5  beoordelingen

review-writer-avatar

Door: mdegal • 1 jaar geleden

review-writer-avatar

Door: Mathias • 1 jaar geleden

review-writer-avatar

Door: Chris2021 • 2 jaar geleden

review-writer-avatar

Door: JoseOrozco77 • 4 jaar geleden

review-writer-avatar

Door: martintielemans1 • 4 jaar geleden

avatar-seller
dc070498

Beschikbare oefenvragen

Oefenvragen 37 Oefenvragen
€3,59 2 verkocht

Enkele voorbeelden uit deze set oefenvragen

1.

What is the difference between Multiclass classification and Multilabel classification?

Antwoord: Multiclass classification makes the assumption that each sample is assigned to ONE label. Multilabel classification assigns to each sample a set of target labels.

2.

What is the Mean absolute error? (MAE)

Antwoord: The average (absolute) difference between true value and predicted value

3.

What is the disadvantage of using the \'error rate\' for classification problems?

Antwoord: The different types of errors are not taken into account

4.

What is the difference between precision and recall?

Antwoord: Precision refers to the fraction of relevant instances among the retrieved instances. Recall refers to the fraction of the total amount of relevant instances that were actually retrieved.

5.

What is the F-score?

Antwoord: The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0.

6.

What is the difference between \'micro-average\' and \'macro-average\'?

Antwoord: A macro-average will compute the metric independently for each class and then take the average (hence treating all classes equally), whereas a micro-average will aggregate the contributions of all classes to compute the average metric.

7.

True or False? The number of possible decision trees grows exponentially with number of features.

Antwoord: True

8.

What is the relationship between \'the base case\' and \'the recursive call\'? (decision trees)

Antwoord: The base case returns a value without making any subsequent recursive calls (= leaf node). If the base case is false, you will enter the recursive condition, which is the case when the function calls itself.

9.

What is \'Entropy\'?

Antwoord: Entropy is a measure of uncertainty, where more uniform distributions have more uncertainty.

10.

What is Gini impurity?

Antwoord: How often a random element would be labelled incorrectly if labels were assigned at random from a given distribution. It reaches its minimum (zero) when all cases in the node fall into a single target category.

Machine Learning Tilburg University 2019-2020



MACHINE LEARNING
LECTURE 1: EVALUATION

Machine Learning: Machine learning studies algorithms which can learn to solve problems from
examples. It enables automate problem solving, like flagging spam in your e-mail or recognizing faces
in photos.

Rules: You need rules to solve these problems. Rules are learned from our examples, but we must
provide the format for these rules (decision tree or linear model for example).
• Example: If (A or B or C) and not D, then SPAM

Learning from examples (Machine Learning Approach):
• Find examples of SPAM and non-SPAM
• Come up with a learning algorithm
• A learning algorithm infers rules from examples
• These rules can then be applied to new data (e-mails in this specific case)

Learning Algorithms: Machine learning uses programmed algorithms that receive and analyze input
data to predict output values. As new data is fed to these algorithms, they learn and optimize their
operations to improve performance, developing 'intelligence' over time.

Types of Machine Learning problems:
• Regression: response will be a (real) number
o Predict person’s age, price of stock, score on an exam
• Binary classification: response will be Yes/No answer (classifying instances into one of two
classes)
o SPAM detection (yes / no), polarity of product review (positive / negative), predict
gender (male / female)
o Each sample is assigned to only ONE label
• Multiclass classification: response will be one of a finite set of options (classification task with
more than two classes, but each sample is assigned to only ONE label)
o Classify articles as politics, science, health, finance, sports etc.
• Multilabel classification: a finite set of Yes/No answers. There is no constraint on how many
of the classes the instance can be assigned.
o Assign songs to one or more genres. Each song may belong to several genres
simultaneously.
▪ Examples: {rock - pop - metal} OR {jazz - blues} OR {rock - punk}
• Ranking: Order objects according to relevance (ranking models for information retrieval
systems)
o Rank web pages in response to user query (like using Google to find information)
• Sequence Labeling: the algorithmic assignment of a categorical label to each member of a
sequence
o Input: sequence of elements (e.g. words)
o Response: a corresponding sequence of labels

1

,Machine Learning Tilburg University 2019-2020


▪Label words in a sentence with their syntactic category (determiner, verb,
noun etc.)
▪ Label frames in speech signal with corresponding phonemes
• Sequence-to-sequence modeling aims to map a fixed length input with a fixed length
o Input: a sequence of elements
o Response: another sequence of elements
▪ Possibly different length
▪ Possibly element from different sets
o Examples: translate between languages or summarizing texts

How to evaluate the success of our machine learning solutions: It depends on the type of problem
• Regression: Mean Absolute Error (average (absolute) difference between true value and
predicted value) or the Mean Squared Error (average square of the difference between true
value and predicted value)
• Classification: Error rate (the proportion of mistakes)
o Disadvantage: kind of mistake not considered and risk of disbalanced classes

Kinds of mistakes:
• False positive – Flagged as SPAM, but non-SPAM
• False negative – Not flagged, but is SPAM
• False positives are usually bigger problems!!!

Precision and Recall: Metrics which focus on one kind of mistake
• Precision – What fraction of flagged emails were real SPAMs?
• Recall – What fraction of real SPAMs were flagged?




F-score: Harmonic mean between precision and recall. It considers both the precision
(p) and the recall (r) of the test to compute the score. F-score reaches its best value at
1 (perfect precision and recall) and worst at 0.

2

,Machine Learning Tilburg University 2019-2020


Two other commonly used F measures are the F2 measure, which weighs recall higher than precision
(by placing more emphasis on false negatives), and the F0.5 measure, which weighs recall lower than
precision (by attenuating the influence of false negatives). → Fβ-score: Parameter β quantifies how
much more we care about recall than precision.




Macro-average: Compute precision and recall per-class, and average:
• Rare classes have the same impact as frequent ones. So, you must look at the scores per
class as well!
o Macro-average precision with two different sets = (P1+P2)/2
o Macro-average recall with two different sets = (R1+R2)/2

Micro-average: Only useful if we treat one class as a default class
• Treat each correct prediction as TP
• Treat each missing classification as FN
• Treat each incorrect prediction as FP

Micro-average of precision with two different sets = (TP1+TP2)/(TP1+TP2+FP1+FP2)
Micro-average of recall with two different sets = (TP1+TP2)/(TP1+TP2+FN1+FN2)

Using examples: Always use disjoint sets of examples (training set, development set, and test set)
• Always use the same evaluation metrics for development set and test set
• Important for evaluation to be close to true (real world) objective


LECTURE 2: DECISION TREES AND FORESTS

Learning Rules: nested if-then-else rules




Classification: The problem of identifying to which of a set of categories a new observation belongs.
Example: when classifying fruit, looking at the shape and color to figure out what type of fruit it is.

Decision tree: Model that predicts the value of a target variable based on several input variables.
Decision trees classify instances by sorting them down the tree from the root to some leaf node,
which provides the classification of the instance. Example:




3

,Machine Learning Tilburg University 2019-2020


• The very top of the three is
called the Root Node
• Internal Nodes have arrows
pointing to them and away
from them
• Leaf Nodes have arrows
pointing to them, but NOT
away from them (The last
node in a path)

Using a decision tree:
If NO  Prediction apply algo with
left branch

If YES  Prediction apply algo with
right branch

Building a decision tree: A decision tree is drawn upside down with its root at the top. The number
of possible trees grows exponentially with number of features.
• You can’t check them all and see which one works best
• So, we need to build a tree incrementally. Thus, selecting one particular node and consider it
as fixed if it is useful, and then continue with another node.

Which question to ask first?
• It is best to ask important questions first
• This is the one which helps us classify the best → If we had to classify data based on only
ONE question, which question would do best?

Step by step decision tree building:
1. Choose one question you could ask
2. Check how many correct answers you get by classifying based on this particular question only
3. Do this for each possible question
4. The first question in the model (decision tree) is the question that produced the most correct
answers
5. Go to next level of the tree and repeat this procedure (excl. the question you already asked)

Building a decision tree:
• If all examples have same label → Create leaf node with label

Otherwise:
• Choose most important question
• Split data into two parts (NO and YES) according to question
• Remove question from question set
• Left branch Apply algo to ← NO examples
• Right branch Apply algo to ← YES examples
• Create node with (question, left branch, right branch)


4

, Machine Learning Tilburg University 2019-2020


Digression: Recursion
• We build and use decision trees with recursive functions
• Recursive function: Function that calls itself during its execution. This enables the function to
repeat itself several times, outputting the result and the end of each iteration
• A recursive function is a function that calls itself until a “base condition” is true, and
execution stops

Trees: Recursively defined data structures
• Base case = Leaf node (Returns a value without making any following recursive calls)
• Recursive case = If the base case is false, enter the recursive condition (Case when the
function calls itself)  Branch node




Decision tree efficiency: Speed depends on number of questions needed to get to a leaf node. This
depends on the depth of the tree. The depth of a decision tree is the length of the longest path from
a root to a leaf ( count the number of branches!)
• Unbalanced trees: The left and right subtree have different number of nodes and depth
• Balanced trees: The left and right subtree have equal number of nodes and depth




Depth of balanced tree: In a balanced binary tree, each time you ask a question, you halve the
number of remaining questions → Example (N = 8, depth = 3) → (= Log (8) / Log (2) == 3)




5

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper dc070498. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €4,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 75323 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€4,49  26x  verkocht
  • (5)
  Kopen