100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Machine Learning €4,49   In winkelwagen

Samenvatting

Summary Machine Learning

2 beoordelingen
 417 keer bekeken  39 keer verkocht

English Summary of Machine Learning course of Master Data Science and Society at Tilburg University. A summary of lecture materials, readings, and notes.

Voorbeeld 4 van de 61  pagina's

  • 1 februari 2023
  • 61
  • 2022/2023
  • Samenvatting
Alle documenten voor dit vak (1)

2  beoordelingen

review-writer-avatar

Door: jdebeeld • 5 maanden geleden

review-writer-avatar

Door: koenmiddelhof • 1 jaar geleden

avatar-seller
liekebuuron
Machine learning
Lecture 1

Machine learning is about automation of problem solving. It is the study of computer
algorithms that improve automatically through experience. Involves becoming better at a
task T based on some experience E with respect to some performance measure P.
Examples:
- Span detection
- Movie recommendation
- Speech recognition
- Credit risk analysis
- Autonomous driving
- Medical diagnosis.
It comes up with a learned algorithm. It is about learning from experience.

What does it involve?
- ML may involve a notion of generalization. When the machine learns relationships
between the input and the output, we want this to work on unseen data, which is
the concept of generalization. Is it safe to assume that current observations are
generalized to future observations?
- Annotated data, objective, optimization algorithm, features/representations,
assumptions are some critical components.
- We assume the database presents the population. As we have more data, the output
becomes better.
- There is an optimization algorithm that incrementally works towards the best
outcome.

Different types of learning:
Starting points:
- Supervised learning: annotated/labelled dataset / ground truth
o Classification: discrete variable
o Regression: continuous variable
- Unsupervised learning: unlabeled dataset
o clustering

Examples:
Spam vs non-spam?




This is usually a problem of text mining. The emails have to be pre-processed in such a way
that we can create features from the dataset. This is a binary classification problem. The

,learning algorithm should come up with a function that matches the representation of the
emails.
- Find examples of spam and non-spam
- Come up with a learning algorithm
- A learning algorithm infers rules from examples: if (A or B or C) and not D, then spam
- These rules can then be applied to new data (emails)

Learning algorithms:
- See several different learning algorithms
- Implement 2-3 simple ones from scratch in Python
- Learn about Python libraries for ML (scikit-Learn)
- How to apply them to real-world problems

Machine learning examples:
- Recognize handwritten numbers and letters
- Recognize faces in photos
- Determine whether text expresses positive, negative or no opinion
- Guess person’s age based on a sample of writing
- Flag suspicious credit-card transactions
- Recommend books and movies to users based on their own and others’ purchase
history
- Recognize and label mentions of people’s or organization names in text

Types of learning problems:
Regression:
- Response: a (real) number
- Predict a person’s age
- Predict price of stock
- Predict student’s score on exam
Binary classification:
- Response: Yes/No answer
- Detect spam
- Predict polarity of product review: positive vs negative
Multiclass classification:
- Response: one of a finite set of options
- Classify newspaper article as:
o Politics, sports, science, technology, health, finance
- Detect species based on photo
o Passer domesticus, Calidris alba, Streptopelia, decaocto, corvus cornax
Multilabel classification:
- The output does not have to consist of a single thing, but it could be multiple things
(this is the difference with multiclass classification)
- Assign songs to one or more genres (rock, pop, metal)
- You are not trying to find all of the labels correctly, but you are trying to find the
most correct labels during training.
Autonomous behavior (example of a car)
- Input: measurements from sensors – camera, microphone, radar, accelerometer.

, - Response: instructions for actuators – steering, accelerator, brake.
- Evaluation: choose a baseline, choose a metric, compare!
- Different tasks, different metrics:
o Predicting age
o Flagging spam

Two metrics that we often use in regression problems:
- Mean absolute error – the average (absolute) difference between true value and
predicted value (yn true value (ground truth), ŷn predicted value)


- Mean squared error: the average square of the difference between true value and
predicted value – more sensitive to outlier, but it is differentiable (as opposed to
MAE)



For a binary classification problems, the metrics often used are:
- Accuracy
- Error rate
These are not really informative, especially if the database is not balanced.



Classification:
- False positive – flagged as spam, but not spam
- False negative – not flagged, but is spam
- False positives are a bigger issue for this problem!
- Ture positive – spam classified as spam
- Ture negative – not-spam classified as not-spam

Precision and recall:
- Metrics which focus on one kind of mistake
- Precision: what fraction of flagged emails were real spam?

- Recall: what fraction of real spams were flagged?


Example:

, Confusion matrix example:




f-score:
- Harmonic mean between precision and recall (a kind of average)


- Aka F-measure

Fβ :
- Parameter β quantifies how much more we care about recall than precision, when it
is greater than 1, that means, recall is weighted more, when it is smaller than 1, that
means precision is weighted more



Multiclass classification:
You can still make a confusion matrix with multiclass classification as well.




When there are more than two classes, you have to come up with alternatives when it
comes to rating the learning outcomes. You can use macro-average and micro-average.

Macro-average:
Precision true positive over labeled positives; recall, true positives over actual positives.
- You can only use this if the data is balanced.
- Compute precision and recall per-class, and average:

- Rare classes have the same impact as frequent classes

Micro-average:
- Gives every point equal importance (this is the difference from the macro-average).
- Micro averaging treats the entire set of data as an aggregate result, and calculates 1
metric rather than k metrics that get averaged together

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper liekebuuron. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €4,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 79223 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€4,49  39x  verkocht
  • (2)
  Kopen