Samenvatting

Data Science: Machine Learning 2017/2018 - Summary Lectures

Name: Data Science: Machine Learning 2017/2018 - Summary Lectures
SKU: doc_437796
Rating: 4.50 (6 reviews)
Author: ambervdmeijs

6 beoordelingen

429 keer bekeken 49 keer verkocht

Vak
Machine Learning

Instelling
Tilburg University (UVT)

Full summary including an introduction of Machine Learning and algorithms, such as Decision Tree, Perceptron, Gradient Descent, Logistic Regression (classifier) and Neural Networks. This summary also includes a section about Feature Engineering. Extra context and illustrations/graphs are also given...

[Meer zien]

Voorbeeld 3 van de 32 pagina's

Bekijk voorbeeld

Geupload op 20 juni 2018
Aantal pagina's 32
Geschreven in 2017/2018
Type Samenvatting

machine learning
decision tree
algorithms
classifiers
perceptron
gradient descent
logistic regression
neural networks
features
feature engineering
data science
tilburg university

6 beoordelingen

Door: A---my • 4 jaar geleden

This summary made the difference, absolutely (B4 2019-2020).

Door: isabelle_olphen • 4 jaar geleden

Door: timmeikelenboom • 5 jaar geleden

Door: jannebillekens • 5 jaar geleden

Door: misterborodach • 5 jaar geleden

Door: berend_boomen • 6 jaar geleden

Volgen

ambervdmeijs Lid sinds 6 jaar 125 documenten verkocht

€3,99

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Machine Learning
Lecture 1 – Introduction

You can have collection of rules that tells the program what to do. You can write these rules by hand, and apply
them and test them. Then you notice that they work, or not work and can change it. You automating it, but you
are doing it by hand. With Machine Learning you take automation a bit further, we want the machine itself to
learn. How would that go? You need to collect some information about the distribution of words or sequences.
 Learning from examples, based on supervised learning.
 Find examples of SPAM and non-SPAM
 Come up with a learning algorithm
 A learning algorithm infers rules from examples
 These rules can then be applied to new data (emails)

Types of learning problems
Machine Learning has an input space and an output space. The nature of the output determines which kind of
machine learning form/problem we are talking about.

Regression
Regression involves estimating or predicting a response. The response/the output variable takes continuous
values. Thus, a real number.
 Predict person’s age
 Predict price of a stock
 Predict student’s score on exam

Binary classification
The output variable takes class labels, but classifies the output into two groups: a yes/no answer, e.g.
True/false or 1/0.
 Detect SPAM
 Predict polarity of product review: positive or negative
 Predict gender: male or female

Multiclass classification
The output is one of a finite set of options. Involve mostly more than thousands of labels / classes / categories.
Each training point belongs to one of n different classes. The goal is to construct a function which, given a new
data point, will correctly predict the class to which the new point belongs to.
 Classify subject newspaper articles: politics, sports, science, technology, health, etc.
 Detect species based on photo: passer domesticus, calidris alba, etc.

Multilabel classification
Multilabel classification is a classification problem where multiple target labels can be assigned to each
observation instead of only one. A multilabel classifier has to product a vector of output values. The output is
based on yes/no answers. You can think of it as a binary classification.
 Assign songs to one or more genres:
o {rock, pop, metal}
o {hip-hop, rap}
o {jazz, blues}
o {rock, punk}

Ranking
Order object according to relevance. Ranking models for information retrieval systems. Training data consists
of lists of items with some partial order specified between items in each list.
 Rank web pages in response to user query
 Predict student’s preference for courses in a program

,Sequence labelling
Type of pattern recognition task that involves the algorithmic assignment of a categorical label to each member
of a sequence of observed values (e.g. speech tagging). Input is a sequence of elements (words) and the
response is a corresponding a sequence of labels.
 Labels words in a sentence with their syntactic category
 Labels frames in speech signal with corresponding phonemes (W, ð, Ɛ, ɚ)

o N inputs | N inputs | N not necessarily = M | Sequence 2 sequence
o N outputs | M outputs | |

Autonomous behaviour
The input are measurements from sensors – camera, microphone, radar, accelerometer, etc. and the response
are instructions for actuators – steering, accelerator, brake, etc.
Supervise learning is very often improved with reinforcement learning: learn from the sequence. It works with
positive and negative learning. Supervised learning is not the end of the story, but sometimes it is not really
applicable. Unsupervised learning became a very important approach also.

In what situation do you use F1 score instead of accuracy?
___________________
___________________________________

Evaluation
How well is the algorithm learning? You can evaluate the performance by using different evaluation metrics.

Mean Absolute Error
The average absolute difference between true value and predicted value

Mean Squared Error
The average square of the difference between true value and predicted value.

The aforementioned metrics can be used for predicting age (regression, numerical output) with a preference to
MSE. The MSE exaggerates the outliers (/magnitude of big numbers), and the MAE does not.

Accuracy
Accuracy is calculated as the number of all correct predictions divided by the total number of the dataset. The
best accuracy is 1.0, whereas the worst is 0.0. It can also be calculated by 1- error rate.

(TP + TN) / (P + N)

Error rate
It is a proportion of mistakes The error rate is calculated as the number of all incorrect predictions divided by
the total number of the dataset. The best error rate is 0.0, whereas the worst 1.0.

(FP + FN) / (P + N)

Predicting gender could use accuracy or the error rate as evaluation metric. However, for flagging spam
purposes error rate is preferred.  If accuracy is 99 percent, you would probably display the error rate instead.
Is there any disadvantage? The error rate does not take into account if a false negative is worse than a false
positive.

, Precision and recall
This metric is a useful measure of success of prediction when the classes are very imbalanced. In information
retrieval, precision is a measure of result relevancy, while recall is a measure of how many truly relevant results
are returned. Metrics which focus on one kind of mistakes. Is done sizes of certain sets.
 Precision
The ratio of correctly predicted positive observations to the total predicted positive observations (of
all passengers that labeled as survived, how many actual survived? /what fraction of flagged emails
were real SPAMS?)

 Recall
The ratio of correctly predicted positive observations to the all observations in actual class – yes (of all
the passengers that truly survived, how many did we label? / what fraction of real SPAMS were
flagged as SPAM?)

True Positives (TP) = the correctly predicted positive values
True Negatives (TN) = the correctly predicted negative values
False Positives (FP) = when actual class is no and predicted class is yes
False Negatives (FN) = when actual class is yes but predicted class is no

F-score
The harmonic mean between precision and recall. It is a kind of average aka F-measure. This score takes both
false positives and false negatives into account.

Fbeta
Parameter B quantifies how much more we care about recall than precision. It gives different importance
between precision and recall. F0.5 would mean that we care half as much about recall as about precision. The
beta parameter determines the weight of precision in the combined score. Beta < 1 lends more weight to
precision, while beta > 1 favors recall.

What is the difference between precision/recall, F-score and Fbeta?
F1 is usually more useful than accuracy, especially if you have an uneven class distribution. Accuracy works best
if false positives and false negatives have similar cost.

Macro-average (multi-class classification)
It computes the Fscore per-class, and average. It calculate metrics for each class independently, and find their
unweighted mean. This does not take label imbalance into account. The rare classes have the same impact as
frequent classes. This can be a good thing or a bad thing, depends on what you want.

Micro-average (multi-class classification)
This calculates metrics globally by counting the total number of times each class was correctly predicted and
incorrectly predicted. You do it by a case by case basis.
 Treat each correct prediction as TP
 Treat each missing classification as FN
 Treat each incorrect prediction as FP

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper ambervdmeijs. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €3,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 48298 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Verkoper

Samenvatting

Data Science: Machine Learning 2017/2018 - Summary Lectures

Document informatie

Onderwerpen

Geschreven voor

6 beoordelingen

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?