Machine Learning
Introduction
1. What is machine learning?
↳ and
give the computer examples let it
figure out the code
( i. e. ,
the
way of
solving) itself
what makes a suitable ML problem ?
-
we can't solve the problem explicitly
-
approximate solutions are fine
-
limited reliability ,
predictability interpretability
,
is fine
-
plenty examples to learn from
where do we use ML ?
-
inside other software
in science
analytics data
mining data
-
, ,
-
in science / Statistics
-
machine learning provides systems the ability to
automatically
learn and improve from experience without
being explicitly
programmed .
in
* reinforcement
learning :
taking actions a world based on
feedback
delayed
* online learning :
predicting &
learning at the same time
offline
*
learning :
separate learning ,
predicting &
acting
1 . take a fixed dataset of examples ( =
instances )
2. train a model to learn from examples
test it works its
3. the model to see if
by checking predictions
4 .
If the model works , put it into production
( i. e. .
Use its predictions to take actions )
we do not want to find a solution for each problem individually
(in isolation ) ,
we want
generic solutions !
problem → abstract task →
algorithm
, abstract tasks
supervised explicit examples of input & output
-
:
↳ learn to input
predict the outcome for an unseen
* classification : assign a class to each example
*
regression :
assign a number to each example
unsupervised only inputs provided
-
:
↳ find any pattern that explains the data
something about
2. Classification ( table)
i. start with data : 3 0
Spam ← instance
2 0 ham
the features ham
are the
things we 0 I
T ✗
measure about the instances 4
feature label
2 .
the dataset is fed through
learner
a learning algorithm
3 .
the learning algorithm outputs model
a model ( classifier )
the model is constructed so that when it sees a new instance ,
( with the same features as fed to the learner ) ,
it can produce
( i. e.
a class for us , classify the data)
how to build a classifier ?
-
linear classifier : cut the feature space in two
4 line
1-D: dot 2D 3D plane 4Dt
hyperplane
: : :
, , .
every point in the model space is a line in the feature space
↳ using the definition of a line ( axtbytc =D
loss data ( Modell =
performance of model on the data
→ the tower ,
the better
, -
-
decision tree classifier :
. -
start at the top and node
at every
in the tree we look at one feature
&
depending on
higher or lower .
we move
to the left or right . The leaves are the decision
labelled with classes .
boundary is the shape
in feature space
K Nearest
Neighbours
-
-
:
doesn't do
any learning ,
but remembers the
whole dataset When it point it looks at the K
.
gets a new ,
points that it and point
nearest knows
assigns to the new
the class that is most frequent in this set of
neighbours .
↳ of
K is the
hyper parameter the
algorithm
variations
-
features : usually numerical or
categorical
-
binary classification :
two classes
-
multi class classification : more than two classes
multi label classification all classes be true
: none , some or
may
-
-
class probabilities / scores :
the classifier reports a
probability
or score for each class
offline machine
learning steps :
I. abstract the problem to a standard task (e. g. ,
classification )
2. choose instances and their features
↳ for supervised
learning ,
choose
target .
3. Choose model class ( i. e. ,
linear model ,
decision tree ,
KNN )
4 for
. Search a
good model
✗i features of instance i
yi true label for ✗i
3. Other abstract tasks
FIX it model
regression
where in classification, the target is a class .
In
regression ,
the is number We have to predict the number
target a . ,
given the features .
data → learner → model
, the loss function for
regression
:
'
loss (f) 1h ( f ( ✗i )
Yi )Z the mean
squared errors ( MSE) loss
= -
-
-
;
the regression tree and KNN regression can also be used
clustering data
+
we are
given features ,
but no
target values .
The learner has to decide based on patterns learner
found how to the dataset in clusters tr
separate
model
-
K -
means :
picks three random values and colours all points in
the data
according to which of the 3 means is closest .
Recompile the location of the mean values
by averaging the
locations of all the points .
Then recover the points . Iterate
these two steps ( recomputetrecolourl.tn the end ,
the data
/ the feature space is separated into three natural
clustering regions .
density estimation
Density is a lot like
clustering ,
but the task of the learner is to
a model that indicates whether that
produce outputs a number that
Instance is likely according to the distribution of the data .
the output is a probability (
categorical features) or a
probability density ( numerical features )
generative modeling
↳
building a model from which
you can sample new
examples ( sampling)
semi supervised learning
-
✗L Small set of labelled data
✗u large set of unlabelled data
self train classifier
training C ✗a
:
-
on
loop :
label ✗u with C
retrain C on Xut ✗ i
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper lauraduits1. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €6,65. Je zit daarna nergens aan vast.