Samenvatting

Samenvatting machine learning

8 keer verkocht

Vak
Machine Learning

Instelling
Tilburg University (UVT)

Deze samenvatting bevat alle stof voor het vak machine learning op UVT. Alle notebooks inclusief voorbeelden zijn toegevoegd. Alle colleges zijn bijgewoond, samengevat en uitgewerkt.

[Meer zien]

Voorbeeld 4 van de 149 pagina's

Bekijk voorbeeld

Geupload op 15 oktober 2021
Aantal pagina's 149
Geschreven in 2021/2022
Type Samenvatting

Volgen

robinvanheesch1 Lid sinds 4 jaar 93 documenten verkocht

€7,49

Ook beschikbaar in voordeelbundel v.a. €12,99

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Ook beschikbaar in voordeelbundel (1)

Master Data science and society

€ 29,46 € 12,99

6x verkocht

4 items

1. Samenvatting - Samenvatting analytics
2. Samenvatting - Summary regulation and law
3. Samenvatting - Samenvatting machine learning
4. Samenvatting - Samenvatting idt
Meer zien

Samenvatting Machine Learning.

Week 1: introduction to machine learning.

When you go to your mail, it automatically will put spam into an apart spam folder. What the system does
is for example: if (A or B or C) and not D, then spam.

Machine learning is the study of computer algorithms that improve automatically through experience →
involves becoming better at a task T based on some experience E, with respect to some performance
measure P. Learning in ML is learning based on a performance measure.

How does the ML process works?

1. Find examples of spam and non-spam (training set).
2. Come up with a learning algorithm.
3. This learning algorithm infers rules from examples.
4. The rules that you infer from this training set, can be applied to new and unseen data (emails) →
to understand how your model generalizes.

Purpose of ML is to solve the problem, when the new problem comes. The unseen data needs to be as
close as possible to the real world.

Machine learning examples:

• Recognize handwritten numbers and letters.
• Recognize faces in photos.
• Determine whether text expresses positive, negative, or no opinion.
• Guess person’s age based on a sample of writing.
• Flag suspicious credit card transactions.
• Recommend books and movies to users based on their own and others purchase history.
• Recognize and label mentions of people’s or organization names in text.

Different types of learning problems:

1. Regression:
• The response of regression is a real number.
o A persons age.
o Predict price of a stock.
o Predict student’s score on exam.
2. Binary classification:
• The response is a yes/no answer → a condition being there yes or no.
o Detect spam.
o Predict polarity of product review: positive VS negative (sentiment analysis).
3. Multiclass classification:
• Response is one of a finite set of options.
o Classify newspaper article as: politics, sports, science, technology, health, finance,
etc.
o Detect species based on a photo.
o Detect a movie genre: romance, action, thriller, etc.

, 4. Multilabel classification:
• Response is a finite set of yes/no answer:
o Assign songs to one or more genres: (rock, pop, metal), (hip-hop, rap), (jazz,
blues), (rock, punk).
5. Ranking:
• Most relevant when your searching in for example google → most interesting pages will
come on top. So ranking is about order object according to relevance.
o Rank web pages in response to user query.
o Predict student’s preference of courses in a program.
6. Sequence labeling (relevant in speech recognition).
• The input is a sequence of elements (words). The
response is a corresponding sequence of labels.
o Label words in a sentence with their syntactic
category determine noun adverb verb (prep
noun).
o Label frames in speech signal with
corresponding phonemes.
7. Autonomous behavior (self-driving cars).
• Input is measurement from sensor-camera, microphone radar, etc. Response is
instructions for actuators, -steering, accelerator, brake, etc.

Evaluation.

One of the most important problems for ML is the generalization problem. To see how a method
generalizes, you need some metric, or some standard → how are you going to understand how good your
model is working?

For regression problems you can use MAE or MSE. MSE is more sensitive to outliers because of squaring.

• MAE = the average (absolute) difference between true and predicted vale.

• MSE = the average square of the difference between true value and predicted value.

Error rate is a metric which compares two things to the whole set of things, so it does not distinguish TP
form TN. Example: in the gender binary approach: if you assume that there are two genders, if you then
want to label people as male/female, normally these binary conditions are if something is there or not.
But if you want to look how likely your model labels the data in comparison to the true data → you can
use error rate. But this is not an ideal metric, because it doesn’t distinguish TP from TN. That is why we
use different approaches, that requires to split the data into FP/FN/TP/TN. In the email classification
example, we get the following:

, • False positive: flagged as spam, but is not spam.
• False negative: not flagged as spam, but is spam.
• False positive are a bigger problem! You don’t want a normal email flagged as spam →
minimize. So for different problems you have different metrics.

Positives = something being there regardless of the sentimental meaning of something being there.

Different metrics focus on one kind of mistake:

• Precision: what fraction of flagged emails were real spam?
• Recall: what fraction of real spams were flagged?

The flagged sets (predicted spam) consists of true negatives and false positives. And the spam consists of
true positives and false negatives.

Example of a confusion matrix:

, The f-score is another measure which combines precision and recall. It is the harmonic mean between
precision and recall. A kind of average.

There is also a more generalized version of the f-score → F-beta score. The parameter beta quantifies how
much more we care about recall than precision.

For example. F0.5 is the metric to use if we care half as much about recall as about precision.

What to do when it comes to multiclass classification?

There are three classes: Spam, Ok, Phish. Now, you can come up with a similar matrix with in the columns
the predicted values, and in the rows the actual values. This allows you to distribute your dataset in such
a way that each of your datapoints falls into exactly one of the points. Looking into this metric is easier to
imagine what category each datapoint falls into. In the example, 2 is a false negative for spam, false
positive for OK, 4 is a FN for phish and a false positive for spam (everything on the diagonal gives us the
true positives).

There are two approaches you can take when it comes to calculating precision and recall:

• Macro-average: you look into every class by itself with a one VS all method → look for a condition
being there, or not (looking at the other classes) and then calculate the metrics for those classes
separately with one VS all approach, and then, taking the average of those things (so if you have
3 classes, you divide by 3). The problem is that it gives equal weight to each class, regardless of
the size in the dataset.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper robinvanheesch1. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 64450 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Verkoper

Samenvatting

Samenvatting machine learning

Document informatie

Onderwerpen

Geschreven voor

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?