Complete Lecture Material +
Notes Behavioural Data Science
Lecture 1 – A Dialogue on Theories, Phenomena, and Data..................................................................2
Lecture 2 – Complexity and Network Models......................................................................................13
Lecture 3 – The New World of Behavioural Data.................................................................................19
Lecture 4 – Binary Classification..........................................................................................................28
Lecture 5 – Bayesian Inference............................................................................................................38
Lecture 6 – The Ultimate Debate.........................................................................................................47
,Lecture 1 – A Dialogue on Theories, Phenomena, and
Data
Behavioural Data Science
Overview of this lecture
Overview of the course
Behavioural Data Science: task and scope
Interplay between data and theory
o Data
o Phenomena
o Theory
The role of mathematical modelling
Future directions
Overview of the Course
Lecturers: Denny Borsboom & EJ Wagenmakers, plus guests
Support: Romy Leferink
Classes:
1. Introduction and theory (DB)
2. Network models for clinical psychology (DB)
3. Big Behavioural Data (DB)
4. Binary Classification and Machine Learning (EJ)
5. Bayesian Inference (EJ)
6. Bayes versus Frequentism Debate (BD & EJ)
This lectures are the exam material! Papers support the lectures but are not themselves
exam material
Questions about course setup to Romy
Questions about material to Denny and EJ
Https://canvas.uva.nl/courses/20281/modules
Exam: 25 MC questions + 1 Essay
There will not be weekly assignments
The propaedeutic thesis is a replacement for the WA
What is Behavioural Data Science?
Revolves around the use of behavioural data and sources to further psychology and behavioural
science.
Behavioural Data Science is a multidisciplinary scientific field that aims to facilitate understanding,
prediction, and change of human behaviour through the analysis of behaviourally defined variables
as they arise in large datasets ("Big Data"), typically gathered using modern digital technology (e.g.,
online or through mobile devices) and analysed with techniques for detecting patterns from high-
dimensional data (e.g., machine learning).
A merge of statistical analysis, informatics, simulation, mathematical theorising, and new data
registration techniques.
,Understanding, prediction, and change
Understanding: construction of psychological theories to explain behaviour
Prediction: application of statistical models to predict behaviour
Change: development of interventions to change behaviour
Control --> predictable change with intervention, could be troublesome in psychology
The complexities of human behaviour
Human behaviour is at the root of many of the most central problems of our time: COVID-19
spread and climate change, but also war and famine have important behavioural
components
Human behaviour is "possibly the most difficult subject ever submitted to scientific analysis"
(Skinner, 1987)
Yet standard methods to study it are remarkably simple: questionnaires, tests, and small-
scale experiments
However, recently, new sources of data are being mined and these offer new ways of
approaching old questions...
The golden age of social science
The new availability of data science is giving us real-time access to human behaviour
We are entering a golden age of social science
Example of a polarised and segregated network on Twitter. The network visualises retweets of
political hashtags from the 2010 US midterm elections. The nodes represent Twitter users and there
, is a directed age from not i to node j if user j retweeted user i. Colours represent political preference:
red for conservatives and blue for progressives.
Information is being sent mainly from democrats to democrats and republicans to republicans. The
echo-chamber effect. Little information is being shared between the two groups.
If we cannot understand why this happens, we cannot hope to influence this behaviour in the
future.
The architecture of the data world
Data
Data are representations of observations
Observation example: "Pete correctly solves IQ test item 36"
Representation: the row that represents Pete has a 1 in the column that represents the IQ
item
Typically, data are structed in rows and columns, i.e., in a spreadsheet
Rows represent cases, while columns represent features/properties/attributions
The values in the columns represent a variable
Variables are always constructed
Phenomena
Phenomena are robust features of the world
For instance, the positive manifold of intelligence, the robust correlation between insomnia
and depression, the effect of time pressure on accuracy
Important: Phenomena are not themselves data
Rather, phenomena are evidenced by patterns in the data
Because psychology is very complex, we often need advanced statistical models to "see" the
patterns
Theories
There are many kinds of theories, but we are often interested in explanatory theories
An explanatory theory is a set of principles that aims to explain phenomena.
You don't try to explain the data, only the phenomena. You only try to explain the
data when fraud is suspected
Typically you don't explain data, you explain features of the data.
Differences in gender
Positive correlations
Non-linear growth in cognitive development
Etc.
It describes a world in which the phenomena would follow "as a matter of course "
Coming up with a good theory is a creative act, but it can be systematised and practised.
Ideally, in behavioural data science we are after mathematically formulated models