ML4QS 2021
Course notes
Name: Tim de Boer VUnetID: wbr950
1 Lecture 1 (chapter 2): Introduction, basics of sensory data
Quantified Self: What and why? (terms: QS definition, QS reasons)
• Definition of QS (earlier, Swan 2013): Any individual engaged in self-tracking of any kind of biological,
physical, behavioral, or environmental information, with a proactive stance toward obtaining information
and acting on it.
• Definition from Mark: Any individual engaged in self-tracking of any kind of biological, physical, behav-
ioral, or environmental information. The self-tracking if driven by a certain goal of the individual with a
desire to act upon the collected information.
• Various variables could be measured: physical activities (miles, calories burned, METS), diet (calories
consumed, fat), physiological state (mood, happiness), mental and cognitive state (IQ, alertness, focus),
environmental variables (location, weather), sensational variables (context, day of week), and social vari-
ables (influence, trust)
• Some goals could be: improve health (self-healing), building self-discipline, other aspects (self-design,
maximize work performance), self-association (be part of the QS-movement), find new life experiences
(self-entertainments, fun)
ML for the Quantified Self (terms: ML challenges) and topics
• Different target groups; could be athlete, or someone with diabetes.
• We can advise the athlete training schedules to optimize progress or prevent over-training.
• We could also measure blood glucose to predict future levels; determine when and how to intervene when
mood is going down to avoid depression.
• The nature of the data for Quantified Self makes it a unique discipline; data is temporal, with noise
and missing measurements, there’s an interaction with an user, and we can learn over multiple datasets
(multiple persons, or groups of persons)
Figure 1: Overview of course
1
,wbr950 – Course notes 2
Terminology and Notation (terms; mathematical notation, time series, dataset, instance)
• We use G for classifier representations, with a hat on top for prediction representation. For numerical
target, we use Y, with a hat on top for prediction.
• Superscript is a specific variable / attribute (denoted with k ), subscript j are the observations in the
dataset.
• A capital is the variable / attribute. A bold capital is the entire dataset; with a subscript i to denote a
specific dataset (of a specific person) and superscript τ to state that this is a temporal dataset.
Crowdsignals and Sensory Data (terms: sensors, creating dataset, selecting a step size)
• CrowdSignals collects data from users and their sensors such as accelerometer, gyroscope (both from
mobile phone), heart rate, manual annotated data and more.
• CrowdSignals gives data seperated by metric (seperate table for heart rate etc) per specific time point or
time interval. The data is measured per stepsize ∆t or also called granularity. We can take bigger step
size by average over x steps or summing up (for example with quantity measurements), but also median
or variance.
• When we want to combine various tables, we also have to decide on the time interval we’ll use. Big time
interval: more smooth, averaged data, less spread, less accurate. Small time interval: lot’s of data points,
not so smooth, more spread but more representative for actual time point.
Questions chapter 2 Tim
• When we measure data using sensory devices across multiple users we often see substantial differences
between the sensory values we obtain. Identify at least three potential causes for these differences.
– The step size of measurements could differ between users; Phones or wearables of users are different,
so some measurements could be different or non-existing across users, which can also require another
way of preprocessing (maybe we have different dataformats?)
– Users have different activity types which of course alter the sensory values
– The environment could be different (e.g. different air pressure due to difference in height of living).
• We have seen that we can make trade-offs in terms of the granularity at which we consider the measure-
ments in our basic dataset. We have shown the difference between a granularity of t = 0.25 s and t = 60
s. We arrived at a choice for t = 0.25 s for our case, but let us think a bit more general: think of four
criteria that play a role in deciding on the granularity for the measurements of a dataset.
– What is the pace of change in activity / Quick changes in activity need higher frequency, otherwise
data of activities might overlap.
– what is the total time of the experiment? Goes hand in hand with point 4 about storage.
– What do we want to predict / do with the downstream ML tasks? Previously, we had just aggregated
too much and lost the fine details in our dataset that might be of great value. if you want to determine
the step frequency of a person, your t should be significantly smaller than the corresponding step
period. On the other, if you want to learn about the motion state of a person, e.g. walking or sitting,
t = 1 minute might not only be sufficient but also optimal with respect to the predictive capabilities
of a model based on the aggregated data.
– How much storage do we have? If data must be stored on the phone itself to make for example
real-time predictions, we might need a bit lower frequency in order to not exceed storage.
• We have identified two tasks we are going to tackle for the crowdsignals data. Think of at least two other
machine learning tasks that could be performed on the crowdsignals dataset and argue why they could be
relevant to support a user.
– They state predicting a label and predicting heart rate.
– Other goals could be giving appropriate daily advice (via Reinforcement Learning) for diabetics but
also for top athletes about training, nutrition, sleep, potential burn-out etc.
– Analyzing and predicting sleep patterns
– Predicting GPS location
– Identifying potential health risks of the public, e.g. for governments to make a campaign to prevent
sitting to long for example, or for companies to increase health of workers based on activity but also
humidity and temperature.
, wbr950 – Course notes 3
2 Lecture 2, chapter 3: Handling sensory noise
Introduction and basic definitions (terms: outlier, distribution-based, distance-based)
• Be cautious for removing outliers; some outliers are possible while some are not.
• We have distribution based and distance based metrics.
Distribution-based outlier detection algorithms (terms: Chauvenet’s criterion, mixture models)
• Chauvenet: Take mean and std for an attribute in our dataset. Calculate the probability P that a random
point in the distribution would be lower than the observed point. In case for an outlier x right of the
1
distribution with points X, we have to satisfy: 1 − P (X < x) < c·N (since surface of P would be very big
as often point X from our distribution would be lower than our point x). For an outlier left in distribution,
1
we do P (X < x) < c·N (since surface of P is small). c is often chosen to be 2. N is number of data points.
• However, this method is quite simple; we only use one distribution but data might be explained better by
more distributions.
• Mixture models uses K normal distributions, with a weight between 0 and 1 for these distributions. These
parameters can be chosen by doing an optimization over the product of probabilities of all data points;
this should be as high as possible in order to be a perfect fit for the data.
Distance-based outlier detection algorithms (terms: simple distance based, local outlier factor)
• Simple distance-based approach: We call points close if there are within distance dmin . A point is an
outlier when there are more than a fraction fmin points outside of dmin ; or else, not enough points are
within the range of dmin .
• More advanced method also takes into account the local density: in figure 2, the density at the left is
higher indicating that the point away from the cluster could indeed be an outlier, whereas to the right
the point is less likely to be an outlier since the distribution itself has a higher variance.
Figure 2: Example of local density for determining outliers
• Local outlier factor: define distance kdist for which we have k points with exactly this distance or less to
our point and at least k - 1 points smaller than this distance (so k-1 in the circle, and 1 or more points
precisely on the circle).
• The point within kdist are the neighborhood points kdist nh . We define the reachability distance of a point
xi to another point x as:
Kreach dist (xi , x) = max(kdist(x), d(x, xi )
. So if the distance between x and xi is lower than the kdist of x, we choose that. But if point x has a lot
of close neighbours, their kdist is very low, so the chance that the distance between x and xi is bigger is
high, so we choose that.