Machine Learning for the Quantified Self (XM_40012)
All documents for this subject (2)
Seller
Follow
sandervanwerkhooven
Content preview
Machine Learning for the
Quantified Self
Terminology
A measurement is one value for an attribute recorded at a specific time point. E.g., heart
rate, velocity, etc.
A time series is a series of measurements in temporal order.
Supervised learning is the machine learning task of inferring a function from a set of labeled
training data.
In unsupervised learning, there is no target measure (or label), and the goal is to describe
the associations and patterns among the attributes.
Reinforcement learning tries to find optimal actions in a given situation so as to maximize a
numerical reward that does not immediately come with the action but later in time.
An example of an instance is x 1=[0,45 , low ,0 ]. A target for the instance is g1=[inactive] .
Outlier detection
An outlier is an observation point that is distant from other observations. There can be two
causes of an outlier:
- Measurement error (Arnold with a heart rate of 400)
, - Variability (Arnold trying to push his limits with a heart rate of 190)
Outliers can be detected and removed using two types of outlier detection:
- Distribution based (we assume a certain distribution of the data)
- Distance based (we only look at the distance between data points)
Distribution-based outlier detection
Chauvenet’s criterion assumes a normal distribution of a single attribute. The mean and
variance of the dataset are used as parameters of the normal distribution. A measurement is
1
rejected if the probability of observing it is less than , where c is a parameter indicating
c⋅N
the certainty of the outlier, and N is the size of the dataset.
Mixture models assume that the data can be described by K normal distributions
{N ( μ1 , σ 1 ) , … , N ( μK , σ K ) }. All the 2 K parameters can be estimated by using the maximum
likelihood of observing the data. Points with the lowest probabilities are candidates for
removal.
Distance-based outlier detection
The simple distance-based approach calls a point close if they are within distance d min . Points
are outliers when there is more than a fraction f min of points outside d min .
The local outlier factor also takes the density of the surrounding points into account, to
prevent a less dense cluster of points to all be flagged as outliers. The first step is to define
the k -distance k dist of a point x i. This is defined as the largest distance among the distances to
the k closest points. In other words, there should be at most k −1 points with a distance less
than k dist and at least one point which is exactly k dist away. These two together form the k dist nh
set.
The reachability of a point x i to another point x is:
This expresses that a reachability distance is the real distance if the point x i is not among the
k nearest points of x (in that case the value for d ( x , x i ) will be larger than k dist (x )) and
otherwise it is k dist of that point, so we set the distance value of all points within k dist (x )
equal to k dist ( x ).
Next, the local reachability density around our point x i is:
Intuitively, this says something about how close x i is to its neighbors. If a point is part of x i’s
nearest k neighbors, but this relationship does not hold the other way, x i might be an outlier.
The lower the average distance to the neighbors, the higher the local reachability distance
becomes.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller sandervanwerkhooven. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $5.97. You're not tied to anything after your purchase.