Summary

ML4QS Book and Lectures summary

428 views 35 purchases

Course
Machine Learning for the Quantified Self (XM_40012)

Institution
Vrije Universiteit Amsterdam (VU)

Book
Machine Learning for the Quantified Self

Full summary of the book "Machine Learning for the Quantified Self" by Hoogendoorn & Funk and additional notes from the lectures of the course included as well.

[Show more]

Preview 3 out of 24 pages

View example

Summarized whole book? Yes
Uploaded on June 29, 2018
Number of pages 24
Written in 2017/2018
Type Summary

machine learning
quantified self
ml4qs
machine learning for the quantified self

Book Title:Machine Learning for the Quantified Self

Author(s):Mark Hoogendoorn, Burkhardt Funk

Edition:Unknown
ISBN:9783319663074
Edition:Unknown

Institution
Vrije Universiteit Amsterdam (VU)
Education
Artificial Intelligence
Course
Machine Learning for the Quantified Self (XM_40012)

TomJansenn

Member since 9 year 196 documents sold

$4.88

Added

Add to cart

Add to wishlist

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Machine Learning for the Quantified Self
Book + Lectures Summary

1 INTRODUCTION
Quantified self: The quantified self is any individual engaged in the self-tracking of any kind of biological,
physical, behavioral, or environmental information. The self-tracking is driven by a certain goal of the
individual with a desire to act upon the collected information.

Why? Three categories:
1. Improve health (treatment, achieve a goal, manage a condition)
2. Improve other aspects of life (maximize work performance, be mindful)
3. Find new life experiences (have fun)

Five-factor framework of Self-Tracking Motivations
1. Self-healing (become healthy)
2. Self-discipline (rewarding aspects of it)
3. Self-design (control and optimize yourself)
4. Self-association (associated with movement)
5. Self-entertainment (entertainment value)

Private vs. Pushed self-tracking: pushed means that
the incentive does not come from the individual
tracking itself but from another party.

So what are the unique characteristics of machine
learning in the quantified self-context? We identify
five of them: (1) sensory data is noisy, (2) many
measurements are missing, (3) the data has a highly
temporal nature, (4) algorithms should enable the
support of and interaction with users without a long
learning period, and (5) we collect multiple datasets (one per user) and can learn across them.

Goal: showing how machine learning (ML: automatically identifying patterns from data) can be applied to
quantified self-data.

Definitions:
Measurement: one value for an attribute recorded at a specific time point
Time series: a series of measurements in temporal order
Supervised learning: the ML task of inferring a function from labeled training data
Unsupervised learning: the ML task of describing the associations and patterns among the attributes
when there is no label (e.g. target/outcome)
Semi-supervised learning: a technique to learn patterns in the form of a function based on labeled and
unlabeled training examples
Reinforcement learning (RL): the ML task to find optimal actions in a given situation so as to maximize a
numerical reward that does not immediately come with the action but later in time

2 BASICS OF SENSORY DATA
Most often used sensors:
▪ Accelerometer - measures the changes in forces upon the phone on the x, y, z-plane.
▪ Gyroscope - orientation of the phone compared to the earth’s surface
▪ Magnetometer - measures x-y-z orientation compared to the earth’s magnetic field

,2.1 CONVERTING RAW DATA TO AN AGGREGATED FORMAT
▪ Determining step size t
o Task, noise level, available memory and storage cost, available computational resources,
information loss, desired detail
▪ Finer granularity of t
o More detailed data, more spread in the data, potentially more noise
o Higher standard deviation and more extreme values
▪ Coarser granularity of t
o Less detailed data, more averaged data, less noise, potential information loss

ML tasks: Focusing on supervised learning we define two tasks: (1) a classification problem, namely
predicting the activity label, and (2) a regression problem, namely predicting the heart rate.

3 HANDLING NOISE AND MISSING VALUES IN SENSORY DATA
Three types of approaches to deal with noise:
1) We can use approaches that detect and remove outliers from the data.
2) We can impute missing values in our data (that could also have been outliers that were removed).
3) We can transform our data in order to identify the most important parts of it.

3.1 DETECTING OUTLIERS
Definition 3.1: An outlier is an observation point that is distant from other observations.

We can have two types of outliers: those caused by a measurement error and those simply caused by
variability of the phenomenon that we observe or measure.
- Interested in getting rid of the measurement errors while keeping the phenomenon variability

A problem we might encounter is that domain knowledge is not widely accessible or to a large extent it is
unknown how to define an outlier for a domain. Domain knowledge could for example tell us that a
measured heart rate of 220+ is an outlier. Without domain knowledge, however, this becomes a lot more
difficult. Below, we will treat various approaches that can help us to remove outliers without having up-
front knowledge on what an outlier is. Hence, we consider it being an unsupervised problem. Be aware
that this process is dangerous as there is a high risk of removing points that are not measurements
errors but actually the most interesting points. Two approaches without domain knowledge are:
a) Distribution-based (assuming a certain distribution of the data)
b) Distance-based (only looking at the distance between the data points)

3.1.1 Distribution-based models
Here, the data should follow a certain known distribution and we remove those that are outside of certain
bounds of the distribution. These approaches are mainly targeted at single attributes X j .

3.1.1.1 Chauvenet’s criterion
Assume a normal distribution, remove data points with an observation
probability that is lower than 1/cN (typically c = 2). The red areas in the
picture below reflect these low probabilities of observing
measurements that are not outliers and we thus assume these
measurements to be outliers – dangerous because we do not know the
distribution with certainty.

3.1.1.2 Mixture models
When assuming that the data of an attribute stems from a single (normal) distribution, we can also
assume that the data can be described with K normal distributions. We then need to find the parameters
that maximize the probability of observing the data we have measured, specified by means of the
likelihood. In other words, we maximize the product of the probabilities of observing our attribute values;
the higher the probabilities of the individual attribute values, the higher the product. One way to do so is

, the Expectation-Maximization algorithm. Once we have found the best parameters, we can consider
identifying outliers again by considering the probability of each observation. Points with the lowest
probabilities are candidates for removal.

3.1.2 Distance-based models
A second type of algorithm to detect outliers is to consider the distance between points in the dataset.
Assume that we have a metric to compute the distance between two instances xi and x j called d(xi , x j).
This is different from the distribution-based approach which only focused on individual attributes.

3.1.2.1 Simple distance-based approach
The first approach takes a global view towards the data: we consider the distance of a point to all other
points. We define a certain minimum distance dmin within which we consider a point to be close to
another point. We say that a point is an outlier when more than
a fraction fmin of the points in the dataset is at a distance of more
than dmin from that point.

3.1.2.2 Local outlier factor (LOF)
The previous approach does not take local density into account
(like image on the right). LOF does take this into account. In
addition, the approach specifies a degree of outlierness (i.e. the
likelihood of an instance being an outlier). Summarizing, the
method specifies a certain degree of outlierness for every data
point. This outlierness is computed through the following steps:
✓ Define the largest distance among the distances of k
closest points
✓ Define the reachability distance (the real distance if xi is not among the k nearest points of x)
✓ Define the local reachability density and compare that to its neighbors
o The closer the proximity to neighbors, the lower the final LOF score of xi

3.2 IMPUTATION OF MISSING VALUES
Obviously, our dataset could contain a lot of missing values. This could be caused by a lot of outliers which
were removed, or possibly by sensors not providing information at certain points in time. There are
different ways to replace these missing values, we refer to this process as imputation:
▪ The mean (common but sensitive to extreme values)
▪ Median (more robust alternative as less sensitive to extreme values)
▪ Mode (for categorical values)
A more sophisticated approach is to predict the missing value for attribute j of instance i (x ji ) using
statistical models such as linear regression. In general, there are two ways this can be done:
▪ Use other attribute values in the same instance
▪ Use values of the same attributes from other instances (for example: average of the previous and
next value of the attribute – this is linear interpolation and only works under the assumption that
the attribute is temporal and follows a linear trend).

3.3 A COMBINED APPROACH: THE KALMAN FILTER
An approach that identifies outliers and replaces these with new values is the Kalman filter. It provides a
model for the expected values based on historical data and estimates how noisy a new measurement is by
comparing the observed with the predicted values. We distinguish between a latent state st and the
measurements that can be performed based on the state, in our case xt.
The next value of a state can be defined as st = Ft st−1 + Btut + wt, where:
• st-1 is the previous state
• ut is a control input state (e.g. sending a message)
• wt is white noise
• Ft and Bt are matrices
The value for the measurement associated with the state is xt = Ht st + vt, where:
• Ht is another matrix
• vt is the measurement white noise

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller TomJansenn. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $4.88. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

64438 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling

Popular Universities in the United States

Popular books

Find notes and summaries for these qualifications

Seller

Summary

ML4QS Book and Lectures summary

Document information

Subjects

Connected book

Written for