Machine Learning
Introduction
1. What is machine learning?
↳ and
give the computer examples let it
figure out the code
( i. e. ,
the
way of
solving) itself
what makes a suitable ML problem ?
-
we can't solve the problem explicitly
-
approximate solutions are fine
-
limited reliability ,
predictability interpretability
,
is fine
-
plenty examples to learn from
where do we use ML ?
-
inside other software
in science
analytics data
mining data
-
, ,
-
in science / Statistics
-
machine learning provides systems the ability to
automatically
learn and improve from experience without
being explicitly
programmed .
in
* reinforcement
learning :
taking actions a world based on
feedback
delayed
* online learning :
predicting &
learning at the same time
offline
*
learning :
separate learning ,
predicting &
acting
1 . take a fixed dataset of examples ( =
instances )
2. train a model to learn from examples
test it works its
3. the model to see if
by checking predictions
4 .
If the model works , put it into production
( i. e. .
Use its predictions to take actions )
we do not want to find a solution for each problem individually
(in isolation ) ,
we want
generic solutions !
problem → abstract task →
algorithm
, abstract tasks
supervised explicit examples of input & output
-
:
↳ learn to input
predict the outcome for an unseen
* classification : assign a class to each example
*
regression :
assign a number to each example
unsupervised only inputs provided
-
:
↳ find any pattern that explains the data
something about
2. Classification ( table)
i. start with data : 3 0
Spam ← instance
2 0 ham
the features ham
are the
things we 0 I
T ✗
measure about the instances 4
feature label
2 .
the dataset is fed through
learner
a learning algorithm
3 .
the learning algorithm outputs model
a model ( classifier )
the model is constructed so that when it sees a new instance ,
( with the same features as fed to the learner ) ,
it can produce
( i. e.
a class for us , classify the data)
how to build a classifier ?
-
linear classifier : cut the feature space in two
4 line
1-D: dot 2D 3D plane 4Dt
hyperplane
: : :
, , .
every point in the model space is a line in the feature space
↳ using the definition of a line ( axtbytc =D
loss data ( Modell =
performance of model on the data
→ the tower ,
the better
, -
-
decision tree classifier :
. -
start at the top and node
at every
in the tree we look at one feature
&
depending on
higher or lower .
we move
to the left or right . The leaves are the decision
labelled with classes .
boundary is the shape
in feature space
K Nearest
Neighbours
-
-
:
doesn't do
any learning ,
but remembers the
whole dataset When it point it looks at the K
.
gets a new ,
points that it and point
nearest knows
assigns to the new
the class that is most frequent in this set of
neighbours .
↳ of
K is the
hyper parameter the
algorithm
variations
-
features : usually numerical or
categorical
-
binary classification :
two classes
-
multi class classification : more than two classes
multi label classification all classes be true
: none , some or
may
-
-
class probabilities / scores :
the classifier reports a
probability
or score for each class
offline machine
learning steps :
I. abstract the problem to a standard task (e. g. ,
classification )
2. choose instances and their features
↳ for supervised
learning ,
choose
target .
3. Choose model class ( i. e. ,
linear model ,
decision tree ,
KNN )
4 for
. Search a
good model
✗i features of instance i
yi true label for ✗i
3. Other abstract tasks
FIX it model
regression
where in classification, the target is a class .
In
regression ,
the is number We have to predict the number
target a . ,
given the features .
data → learner → model
, the loss function for
regression
:
'
loss (f) 1h ( f ( ✗i )
Yi )Z the mean
squared errors ( MSE) loss
= -
-
-
;
the regression tree and KNN regression can also be used
clustering data
+
we are
given features ,
but no
target values .
The learner has to decide based on patterns learner
found how to the dataset in clusters tr
separate
model
-
K -
means :
picks three random values and colours all points in
the data
according to which of the 3 means is closest .
Recompile the location of the mean values
by averaging the
locations of all the points .
Then recover the points . Iterate
these two steps ( recomputetrecolourl.tn the end ,
the data
/ the feature space is separated into three natural
clustering regions .
density estimation
Density is a lot like
clustering ,
but the task of the learner is to
a model that indicates whether that
produce outputs a number that
Instance is likely according to the distribution of the data .
the output is a probability (
categorical features) or a
probability density ( numerical features )
generative modeling
↳
building a model from which
you can sample new
examples ( sampling)
semi supervised learning
-
✗L Small set of labelled data
✗u large set of unlabelled data
self train classifier
training C ✗a
:
-
on
loop :
label ✗u with C
retrain C on Xut ✗ i
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller lauraduits1. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.06. You're not tied to anything after your purchase.