100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Machine Learning €6,49
In winkelwagen

Samenvatting

Summary Machine Learning

 57 keer bekeken  3 keer verkocht

Summary of 32 pages for the course Machine Learning at UVT (all necessary info)

Voorbeeld 4 van de 32  pagina's

  • 9 september 2022
  • 32
  • 2021/2022
  • Samenvatting
Alle documenten voor dit vak (14)
avatar-seller
adata
Lecture 1. Introduction to Machine Learning
What is machine learning (ML) about?
ML is about automation of problem solving.
It is the study of computer algorithms that improve automatically through experience,
(involves becoming a better at a task (T) based on some experience(E) with respect to
some performance measure (P).
Examples:
- Spam detection
- Movie recommendation
- Speech recognition
- Credit risk analysis
- Autonomous driving
- Medical diagnosis

What does it involve?
- ML may involve a notion of generalization. It is safe to assume that current
observation can be generalized to future observation (able to work on unseen data,
we assume that the data points represent real-world data).
- Annotated data, objective, optimization algorithm, features/representations,
assumptions are some critical components.

Different types of learning
A good starting point:
- Supervised learning: annotated/labelled dataset/ground truth
Classification: discrete variable
Regression: continuous variable
- Unsupervised learning: unlabeled dataset
Clustering

SPAM versus non-SPAM
Binary classification problem




Learning process
- Find examples of two classes: SPAM and non-SPAM
- Come up with a learning algorithm
- A learning algorithm infers rules from examples: If (A or B or C) and not D, then
SPAM (for example decision trees)
- These rules can then be applied to new data (emails)

0

,Machine learning examples
- Recognize handwritten numbers and letters
- Recognize faces in photos
- Determine whether text expresses positive, negative or no opinion
- Guess person’s age based on a sample of writing
- Flag suspicious credit-card transactions (binary classification task)
- Recommend books and movies to users based on their own and others’ purchase
history
- Recognize and label mentions of people’s or organization names in text

Types of learning problems: Regression
- Response a (real) number
- Predict person’s age
- Predict price of stock
- Predict student’s score on exam

Types of learning problems: Binary Classification
- Response YES/NO answer
- Detect SPAM
- Predict polarity of product review: positive vs negative expressions

Types of learning problems: Multiclass classification
More than two labels/classes, one way to solve the classification is by extension of logistic
regression. Another learning problem multi label classification: outcome can be link with
different labels (not all the labels should be correct)
Response: one of a finite set of options
- Classify newspaper article as
o Politics, sports, science, technology, health, finance
- Detect species based on photo
o Passer domesticus, Calidris alba, streptopelia decaocto, Corvus corax
- Assign songs to one or more genders:
o Group different classes together as pop, r&b together

Types of learning problems: Autonomous behavior
- Input: measurements from sensors – camera, microphone, radar, accelerometer
- Response: instructions for actuators (make right decisions like steering, accelerator,
brake… we don’t want to kill anyone on the road.

How well is the algorithm learning?
Evaluation: choose a baseline, choose a metric, compare your learning with baseline!
Different tasks, different metrics:
- Predicting age
- Flagging spam (imbalanced data)




1

,Evaluation of Regression Problems (metrics)
- Mean Absolute Error – the average (absolute) difference between true value and
predicted value (yn true value (ground truth), ŷn (predicted value), measures the
average magnitude of the errors in a set of predictions, without considering their
direction.



- Mean Squared Error – the average square of the difference between true value and
predicted value – more weighted/sensitive to outliers, measures how close a fitted
line is to data point.




Evaluation for Classification: Predicting SPAM
- Accuracy: measures how close a measurement is to the true or accepted value.

TP+TN
Accuracy=
TP+ FP+TN + FN

¿ of incorrect classification
Error rate be classified ¿(missclassification rate)
Total number of data points ¿

Not informative if data is unbalanced.


Classification
- False Positive (FP) – flagged as SPAM, but are not-SPAM (bigger issue for this
problem)
- False Negative (FN) – not flagged, but is SPAM

What about medical diagnosis?
Correct classification
- True Positive (TP): SPAM classified as SPAM
- True Negative (TN): Not-SPAM classified as Not-SPAM

Precision and Recall
Metrics which focus on one kind of mistake:
- Precision: the number of positive class predictions that actually belong to the
positive class (what fraction of flagged emails were real SPAMs?)

True Positive
True Positive+ False Positive

- Recall: quantifies the number of positive class predictions made out of all positive
examples in the dataset (what fraction of real SPAMs were flagged?)
True Positive
True Positive+ False Negative

2

, F-score/ F-measure
Harmonic mean between precision and recall a kind of average

2∗Precision∗Recall
F 1=
( Precision+ Recall )

Parameter β quantifies how much more we care about recall then precision, when it is
greater than 1, that means, recall is weighted more, when it is smaller than 1, that means
precision is weighted more.



Example 2. Multiclass classification



Data point (2) is FN for SPAM, FP for OK
Data point (4) is FN for PHISH, FP for SPAM




Precision true positives over labeled positives
Recall true positives over actual positives
- Compute precision and recall per-class, and average:
1 1
+ +1
PS = ½, PO = ½, PP = 1/1, = Ps + Po+ Pp 2 2
=
3 3
- Rare classes have the same impact as frequent classes

Micro-average
- Micro average is the study of the individual class.
- Weights each sample equally
- Aggregate the contributions of all classes to compute the average metric
- Micro Average Precision is the sum of all true positives and divides by the sum of all
true positives plus the sum of all false positives. So basically, you divide the number
of correctly identified predictions by the total number of predictions.




3

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper adata. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 48756 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen
€6,49  3x  verkocht
  • (0)
In winkelwagen
Toegevoegd