Computational Analysis of Digital
Communication
Lecture 1 | Introduction
Readings lecture 1
Kramer et al. 2014
Van Atteveldt & Peng (2018)
Increasing amount of data available online
Much of what we know about huma
behavior…
… is based on what people tell us
- In self-report measures in surveys
- In responses in experimental
research
- In qualitative interviews
Note: although valuable, such
measurements can be biased (Scharkow,
What can we learn from this much data? 2013; Parry et al. 2021)
Timeline of natural language processing But a lot of (mass) communication looks
like this:
Or is based on user-generated content
- Tiktok
- Instagram
- Etc
,What is computational social science, and supervised text classification, topic
why should we care? modelling, word embeddings, as well as
large language models
Field of social science that uses
algorithmic tools and large/unstructured
data to understand human and social
10 characteristics of big data
behavior
1. Big
Complements rather than replaces
The scale of volume of some
traditional methodologies: methods are not
current data sets is often
the goal, but contribute to data generation
impressive. However, big data sets
Includes methods such as: are not an end in themselves, but
they can enable certain kinds of
- Data mining (e.g. scraping and
research including the study of
gathering of large data sets)
rare events, the estimation of
- Software development for social
heterogeneity, and the detection of
science experiences
small differences
- Automated text analysis (e.g.
2. Always-on
sentiment analysis, keyword
Many big data systems are
extraction, dictionary approaches)
constantly collecting data and thus
- Image classification (e.g. face
enable to study unexpected event
recognition, visual topic modeling)
and allow for real-time
- Machine learning approaches (e.g.
measurement
for classification, prediction, topic
3. Nonreactive
modelling)
Participants are generally not
- Actor-based modelling (e.g.
aware that their data are being
simulation of social behavior,
captured or they have become so
spreading of information
accustomed to this data collection
that it no longer changes their
behavior
Why is this important now? 4. Incomplete
Vast amount of digitally available data, Most big data sources are
ranging from social media messages and incomplete, in the sense that they
other digital traces to web archives and don’t have the information that you
newly digitized newspapers and other will want for your research. This is
historical archives a common feature of data that were
created for purposes other than
Large-scale records (big data) of persons research
or businesses are created constantly 5. Inaccessible
Powerful and comparatively cheap Data held by companies and
processing power and easy to use governments are difficult for
computing infrastructure for processing researchers to access
these data
Improve tools to analyze this data,
including network analysis methods and
automatic text analysis methods such as
, 6. Nonrepresentative Definition
Most big data are nonetheless not
“Computational communication
representative of certain
science (CSS) is the label applied to the
populations. Out-of-sample
emerging subfield that investigates the
generalizations are hence difficult
use of computational algorithms to
or impossible
gather and analyse big data and often
7. Drifting
semi- or unstructured data sets to
Many big data systems are
develop and test communication
changing constantly, thus making it
science theories”
difficult to study long-term trends
– Van Atteveldt & Peng, 2018
8. Algorithmically confounded
Behavior in big data systems is not
natural; it is driven by the
engineering goals of the systems Typical research areas
9. Dirty Studies involve:
Big data often includes a lot of
noise (e.g. junk, spam, spurious - Large and complex data
data points) - Consisting of digital traces and
10. Sensitive other “naturally occurring” data
Some of the information that - Requiring algorithmic solution to
companies and governments have analyse
is sensitive - Allowing the study of human
communication by applying and
testing communication theory
Pro’s and con’s of computational methods Political communication
Pro’s - Democratization and polarization
- We can study actual behavior - Hate speech
instead of simply self-reports Social media use
- We can study human being in their
social context instead of in an - Tracking of actual social media use
artificial lab setting - Spreading of behavior, information,
- We can increase our N (higher or emotions
power) Health communication
- Potential to uncover patterns and
insights that we couldn’t - Prevalence of health information
investigate before online
Con’s (online) journalism
- Techniques often (rather) - News coverage across decades
complicated - Gender equality
- Data is often proprietary (not
shared openly)
- Samples are often biased
- Often, data have only insufficient
metadata
, Example 1: analysing news coverage - Studied the media’s attribution of
gender-linked, and political traits to
- Analyse of the coverage of nuclear
US politicians
technology from 1945 to 2014 in
- All three masculine traits were
New York Times
more strongly associated with male
- 51.528 stories
politicians, but only the feminine
- Used LDA topic modelling to
physical traits were more strongly
extract latent topics and analysed
associated with female politicians
their occurrence over time
Example 5: Gender representation in TV
Example 2: Facebook data to predict
personality - Gender representations in over 10
years of daytime TV programming
- 58.000 volunteers who provided
- Used neural networks to
their FB likes, detailed
automatically detect gender in
demographic profiles and the
shown faces
results of several psychometric test
- Women on average remained
- One can predict a variety of
underrepresented on TV
personal characteristics and
- This strong overall bias was
personality traits from simple FB
mirrored across specific
likes
subsamples (news, sports,
Example 3: Dutch telegramsphere advertising)
- Full messaging history of 174
Dutch-language public Telegram
The “Facebook mood manipulation” study
chats/channels
| Kramer et al. 2014
- Used State-of-the-art-web-mining,
neural topic modelling, and social - Massive online experiment (N +-
network analysis techniques 700k)
- Findings raise concerns with - Main RQ: is emotion contagious?
respect to Telegram’s polarization - Experimental groups:
and radicalization capacity positive/negative/control
- Telegram users are active in and - Stimulus: hide
share content across different (negative/positive/random)
communities messages from FB timeline
- Over time, conspiracy-themed, far- - Measurement/dependent variables:
right activist, and COVID-19- sentiment of posts by user
sceptical communities dominated
Example 4: Gender stereotypes in political
news
- Gender differences in political
news coverage to determine
whether the media employ
stereotypical traits in portrayals of
1.095 US politicians
- 5 million US news stories