Machine Learning
Week 1 – Part I: Practical Matters
Introduction
Lectures will be live (being recorded) or prerecorded.
Lecture videos will be shared weekly as well as the accompanying slides.
Slides are not meant to be self-contained, take notes!
The practical sessions will be online, interaction is possible during these sessions.
Group Assignment: ML Challenge
Work in groups of 3 people to solve a challenge problem
30% course grade
No resit
Collaborative work: you will need to describe work division and contribution of each student
Final Exam
Worth 70% course grade
Multiple choice and/or open-ended questions
Programming exercises
Part II: Introduction to Machine Learning
How can we automate problem solving?
Example: flagging spam in your e-mail.
- Classification task
- Requires standard machine learning method.
Some email headers:
Rules: if (A or B or C) and not D, then SPAM.
- Specify them, so the system recognizes them
Machine Learning
Is the study of computer algorithms that improve automatically through experience [1]. (involves becoming better at a task T
based on some experience E with respect to some performance measure P).
Learning process
Find examples of SPAM and non-SPAM (test set)
Come up with a learning algorithm
A learning algorithm infers rules from examples
These rules can then be applied to new data (emails)
Learning algorithms
See several different learning algorithms
Implement simple 2-3 simple ones from scratch in Python
Learn about Python libraries for ML (scikit-learn)
How to apply them to real-world problems
Machine Learning examples: recognize handwritten numbers and letters, recognize faces in photos, determine whether text
expresses positive/negative or no opinion, guess person’s age based on a sample of writing, flag suspicious credit-card
transactions, recommend books and movies to users based on their own and other’s purchase history, recognize and label
mentions of people’s or organization names in text.
Types of learning problems: Regression
Response: a (real) number
Predict person’s age
Predict price of a stock
Predict student’s score on exam
Binary classification
Response: yes/no answer
Detect SPAM
Predict polarity of product revies: positive vs negative
Multiclass classification
More than two elements (picture)
Response: one of a finite set of options
Classify newspaper article as: politics, sports, science, technology, health, finance
Detect species based on photo: passer domesticus, calidris alba etc.
Multilabel classification
Response: a finite set of Yes/No answers
Assign songs to one or more genres: rock, pop, metal, hip-hop
Ranking
Search engines searching for specific source.
Order object according to relevance
Rank web pages in response to user query
Predict student’s preference for courses in a program
Sequence Labeling
1
,Relevant in speech recognition.
Input: a sequence of elements (e.g., words)
Response: a corresponding sequence of labels
Label words in a sentence with their syntactic category Determiner Noun Adverb Verb: Prep Noun
Label frames in speech signal with corresponding phonemes.
Sequence-to-sequence modeling
Input: a sequence of elements
Response: another sequence of elements
Possibly different length
Possibly elements from different sets
Examples: translate between languages (My name is Penelope Me llamo Penélope), summarize text
Autonomous behavior
Self-driving car
Input: measurements from sensors – camera, microphone, radar, accelerometer.
Response: instructions for actuators – steering, accelerator, brake, …
How well is the algorithm learning?
Evaluation
You need some standard, a performance metric!
- Predicting age
- Predicting gender
- Flagging spam
- …
Predicting age – Regression
Mean absolute error – the average (absolute) difference between true value and predicted value.
Mean squared error – the average square of the difference between the true value and predicted value (more sensitive to
outliers).
Predicting spam
We can use the error rate for that:
Kinds of mistakes
False positive: flagged as SPAM, but not non-Spam
False negative: not flagged, but is SPAM
False positives are a bigger problem!
Precision and Recall
Metrics which focus on one kind of mistake.
Precision: what fraction of flagged emails were real SPAMs?
P=¿TP∨ ¿ ¿
¿ F∨¿ ¿
Recall: what fraction of real SPAMs were flagged?
P=¿TP∨ ¿ ¿
¿ S∨¿ ¿
F = true positives + false positives
S = true positives + false negatives
F-score
Harmonic mean between precision and recall, a kind of average (aka the F-measure):
P×R
F 1=2×
(P+ R)
Fβ
Parameter β quantifies how much more we care about recall than precision.
P× R
F β =( 1+ β 2 ) × 2
β ×(P+ R)
For example F0.5 is the metric to use if we care half as much about recall as about precision.
Is precision, recall and f-score applicable for Multiclass Classification?
2
, Macro-average
Compute precision and recall per-class, and average.
Rare classes have the same impact as frequent classes.
Micro-average
Treat each correct prediction as TP
Treat each missing classification as FN
Treat each incorrect prediction as FP
Properties:
- In single-label classification
- If we average over all classes: including null/default class.
Precision=Recall=F−score=Accuracy
Multilabel classification
Each example may be labeled with any number of classes. How do micro P and R behave in this case?
Using examples: imagine you’re studying for a very competitive exam – how do you use learning material?
Disjoint sets of examples
Training set: observe patterns, infer rules
Development set: monitor performance, choose best learning options
Test set: REAL EXAM, not accessible in advance
Important considerations
Use the same evaluation metrics:
Development set
Test set
Important for evaluation to be close to true (real world) objective.
Summary
Machine learning studies algorithms which can learn to solve problems from examples Several canonical
problem types.
First step: decide on evaluation metric
Separate training, development and test examples
Week 2 – Decision Trees
Supervised machine learning
Supervised: training data is labeled (known). Such a learning algorithm reads the training
data and computes a learning fuction (f). The function can then label future examples.
DT learning is a function where the labels are captured by a tree. In practice, this can be
more complex. For example: when hyper parameter tuning is applied.
A hyper parameter is a parameter whose value is set before the learning process begins,
so not derived during the learning. Usually, other parameters of the learning process are
learned. The value of the hyperparameter is used to control the learning process. Tuning is
done to find the best possible model to optimize the learning.
The depth of a decision tree is an example of a hyperparameter.
When hyper parameter tuning is involved, the data is split into 3 portions: training, validation
and test sets. Using the training and validation data a good value for maximum depth that
the trays between overfitting and underfitting can be found. The resulting decision tree model
is then run on the test data to get an estimate of how well the model is likely to do in the future on the unseen data.
Weakness of DT: prone to overfitting. Overfitting means doing well on the training set, but not on the generalization set (the test
set). On the bright side: they are very understandable.
Decision trees can be seen as a list of tests, can be used to classify objects
(with their hierarchical structure). Decision tree learning is about constructing
the tree.
Some real-life examples using decision trees:
Medical Diagnosis
A DT in predicting hepatitis. This tree is generated to support the diagnosis in
the existence or non-existence of the markers.
Customer Segmentation
A DT for the market segmentation of car consumers. Income is the main
identifier in people’s choices of cars. Depending on that, several other identifiers
such as profession, marital status and age are important too.
Decision trees in Data Mining can be used in classification tasks, where the
predicted outcome is the class. In this course mostly classification trees (not
regression).
A decision tree consists of:
Nodes: check the value of a feature.
3