100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Machine Learning Course UvT 2021 €4,48   In winkelwagen

Samenvatting

Summary Machine Learning Course UvT 2021

 66 keer bekeken  3 keer verkocht

An extensive summary of the Machine Learning course at UvT (Fall 2021).

Voorbeeld 3 van de 30  pagina's

  • 9 juni 2021
  • 30
  • 2020/2021
  • Samenvatting
Alle documenten voor dit vak (14)
avatar-seller
sabrinadegraaf
Machine Learning
Week 1 – Part I: Practical Matters
Introduction
 Lectures will be live (being recorded) or prerecorded.
 Lecture videos will be shared weekly as well as the accompanying slides.
 Slides are not meant to be self-contained, take notes!
 The practical sessions will be online, interaction is possible during these sessions.
Group Assignment: ML Challenge
 Work in groups of 3 people to solve a challenge problem
 30% course grade
 No resit
 Collaborative work: you will need to describe work division and contribution of each student
Final Exam
 Worth 70% course grade
 Multiple choice and/or open-ended questions
 Programming exercises

Part II: Introduction to Machine Learning
How can we automate problem solving?
Example: flagging spam in your e-mail.
- Classification task
- Requires standard machine learning method.
Some email headers:

Rules: if (A or B or C) and not D, then SPAM.
- Specify them, so the system recognizes them

Machine Learning
Is the study of computer algorithms that improve automatically through experience [1]. (involves becoming better at a task T
based on some experience E with respect to some performance measure P).

Learning process
 Find examples of SPAM and non-SPAM (test set)
 Come up with a learning algorithm
 A learning algorithm infers rules from examples
 These rules can then be applied to new data (emails)

Learning algorithms
 See several different learning algorithms
 Implement simple 2-3 simple ones from scratch in Python
 Learn about Python libraries for ML (scikit-learn)
 How to apply them to real-world problems
Machine Learning examples: recognize handwritten numbers and letters, recognize faces in photos, determine whether text
expresses positive/negative or no opinion, guess person’s age based on a sample of writing, flag suspicious credit-card
transactions, recommend books and movies to users based on their own and other’s purchase history, recognize and label
mentions of people’s or organization names in text.

Types of learning problems: Regression
Response: a (real) number
 Predict person’s age
 Predict price of a stock
 Predict student’s score on exam

Binary classification
Response: yes/no answer
 Detect SPAM
 Predict polarity of product revies: positive vs negative

Multiclass classification
More than two elements (picture)
Response: one of a finite set of options
 Classify newspaper article as: politics, sports, science, technology, health, finance
 Detect species based on photo: passer domesticus, calidris alba etc.

Multilabel classification 
Response: a finite set of Yes/No answers
 Assign songs to one or more genres: rock, pop, metal, hip-hop

Ranking
Search engines searching for specific source.
Order object according to relevance
 Rank web pages in response to user query
 Predict student’s preference for courses in a program
Sequence Labeling

1

,Relevant in speech recognition.
Input: a sequence of elements (e.g., words)
Response: a corresponding sequence of labels
 Label words in a sentence with their syntactic category Determiner Noun Adverb Verb: Prep Noun
 Label frames in speech signal with corresponding phonemes.

Sequence-to-sequence modeling
Input: a sequence of elements
Response: another sequence of elements
 Possibly different length
 Possibly elements from different sets
Examples: translate between languages (My name is Penelope  Me llamo Penélope), summarize text

Autonomous behavior
Self-driving car
Input: measurements from sensors – camera, microphone, radar, accelerometer.
Response: instructions for actuators – steering, accelerator, brake, …

How well is the algorithm learning?
Evaluation
You need some standard, a performance metric!
- Predicting age
- Predicting gender
- Flagging spam
- …

Predicting age – Regression
Mean absolute error – the average (absolute) difference between true value and predicted value.



Mean squared error – the average square of the difference between the true value and predicted value (more sensitive to
outliers).




Predicting spam
We can use the error rate for that:

Kinds of mistakes
 False positive: flagged as SPAM, but not non-Spam
 False negative: not flagged, but is SPAM
 False positives are a bigger problem!

Precision and Recall
Metrics which focus on one kind of mistake.
Precision: what fraction of flagged emails were real SPAMs?
P=¿TP∨ ¿ ¿
¿ F∨¿ ¿
Recall: what fraction of real SPAMs were flagged?
P=¿TP∨ ¿ ¿
¿ S∨¿ ¿
F = true positives + false positives
S = true positives + false negatives

F-score
Harmonic mean between precision and recall, a kind of average (aka the F-measure):
P×R
F 1=2×
(P+ R)

Parameter β quantifies how much more we care about recall than precision.
P× R
F β =( 1+ β 2 ) × 2
β ×(P+ R)
For example F0.5 is the metric to use if we care half as much about recall as about precision.

Is precision, recall and f-score applicable for Multiclass Classification?



2

, Macro-average
Compute precision and recall per-class, and average.
Rare classes have the same impact as frequent classes.
Micro-average
Treat each correct prediction as TP
Treat each missing classification as FN
Treat each incorrect prediction as FP

Properties:
- In single-label classification
- If we average over all classes: including null/default class.
Precision=Recall=F−score=Accuracy
Multilabel classification
Each example may be labeled with any number of classes. How do micro P and R behave in this case?
Using examples: imagine you’re studying for a very competitive exam – how do you use learning material?

Disjoint sets of examples
Training set: observe patterns, infer rules
Development set: monitor performance, choose best learning options
Test set: REAL EXAM, not accessible in advance

Important considerations
Use the same evaluation metrics:
 Development set
 Test set
Important for evaluation to be close to true (real world) objective.

Summary
 Machine learning studies algorithms which can learn to solve problems from examples Several canonical
problem types.
 First step: decide on evaluation metric
 Separate training, development and test examples

Week 2 – Decision Trees
Supervised machine learning
Supervised: training data is labeled (known). Such a learning algorithm reads the training
data and computes a learning fuction (f). The function can then label future examples.

DT learning is a function where the labels are captured by a tree. In practice, this can be
more complex. For example: when hyper parameter tuning is applied.
A hyper parameter is a parameter whose value is set before the learning process begins,
so not derived during the learning. Usually, other parameters of the learning process are
learned. The value of the hyperparameter is used to control the learning process. Tuning is
done to find the best possible model to optimize the learning.
The depth of a decision tree is an example of a hyperparameter.
When hyper parameter tuning is involved, the data is split into 3 portions: training, validation
and test sets. Using the training and validation data a good value for maximum depth that
the trays between overfitting and underfitting can be found. The resulting decision tree model
is then run on the test data to get an estimate of how well the model is likely to do in the future on the unseen data.

Weakness of DT: prone to overfitting. Overfitting means doing well on the training set, but not on the generalization set (the test
set). On the bright side: they are very understandable.
Decision trees can be seen as a list of tests, can be used to classify objects
(with their hierarchical structure). Decision tree learning is about constructing
the tree.

Some real-life examples using decision trees:

Medical Diagnosis
A DT in predicting hepatitis. This tree is generated to support the diagnosis in
the existence or non-existence of the markers.

Customer Segmentation
A DT for the market segmentation of car consumers. Income is the main
identifier in people’s choices of cars. Depending on that, several other identifiers
such as profession, marital status and age are important too.

Decision trees in Data Mining can be used in classification tasks, where the
predicted outcome is the class. In this course mostly classification trees (not
regression).

A decision tree consists of:
 Nodes: check the value of a feature.

3

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper sabrinadegraaf. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €4,48. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 61001 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€4,48  3x  verkocht
  • (0)
  Kopen