Machine Learning for the
Quantified Self
Terminology
A measurement is one value for an attribute recorded at a specific time point. E.g., heart
rate, velocity, etc.
A time series is a series of measurements in temporal order.
Supervised learning is the machine learning task of inferring a function from a set of labeled
training data.
In unsupervised learning, there is no target measure (or label), and the goal is to describe
the associations and patterns among the attributes.
Reinforcement learning tries to find optimal actions in a given situation so as to maximize a
numerical reward that does not immediately come with the action but later in time.
An example of an instance is x 1=[0,45 , low ,0 ]. A target for the instance is g1=[inactive] .
Outlier detection
An outlier is an observation point that is distant from other observations. There can be two
causes of an outlier:
- Measurement error (Arnold with a heart rate of 400)
, - Variability (Arnold trying to push his limits with a heart rate of 190)
Outliers can be detected and removed using two types of outlier detection:
- Distribution based (we assume a certain distribution of the data)
- Distance based (we only look at the distance between data points)
Distribution-based outlier detection
Chauvenet’s criterion assumes a normal distribution of a single attribute. The mean and
variance of the dataset are used as parameters of the normal distribution. A measurement is
1
rejected if the probability of observing it is less than , where c is a parameter indicating
c⋅N
the certainty of the outlier, and N is the size of the dataset.
Mixture models assume that the data can be described by K normal distributions
{N ( μ1 , σ 1 ) , … , N ( μK , σ K ) }. All the 2 K parameters can be estimated by using the maximum
likelihood of observing the data. Points with the lowest probabilities are candidates for
removal.
Distance-based outlier detection
The simple distance-based approach calls a point close if they are within distance d min . Points
are outliers when there is more than a fraction f min of points outside d min .
The local outlier factor also takes the density of the surrounding points into account, to
prevent a less dense cluster of points to all be flagged as outliers. The first step is to define
the k -distance k dist of a point x i. This is defined as the largest distance among the distances to
the k closest points. In other words, there should be at most k −1 points with a distance less
than k dist and at least one point which is exactly k dist away. These two together form the k dist nh
set.
The reachability of a point x i to another point x is:
This expresses that a reachability distance is the real distance if the point x i is not among the
k nearest points of x (in that case the value for d ( x , x i ) will be larger than k dist (x )) and
otherwise it is k dist of that point, so we set the distance value of all points within k dist (x )
equal to k dist ( x ).
Next, the local reachability density around our point x i is:
Intuitively, this says something about how close x i is to its neighbors. If a point is part of x i’s
nearest k neighbors, but this relationship does not hold the other way, x i might be an outlier.
The lower the average distance to the neighbors, the higher the local reachability distance
becomes.
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper sandervanwerkhooven. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €5,57. Je zit daarna nergens aan vast.