LECTURE 1
,MUCH OF WHAT WE KNOW ABOUT HUMAN BEHAVIOR
• In self-report measures in surveys
• In responses in experimental research
• In qualitative interviews.
WHAT IS COMPUTATIONAL ANALYSIS?
Example: Surprising sources of information
• In 2009, researchers wanted to study wealth and poverty in Rwanda
• They conducted a survey with a random sample of 1,000 customers of the largest mobile phone
provider
• They collected demographics, social, and economic characteristics (incl. wealth)
• So far, traditional social science, right?
• The authors also had access to complete call records from 1.5 million people
• Combining both data sources, they used the survey data to “train” a machine learning model to
predict a person’s wealth based on their call records
• They also estimated the places of residence based on the geographic information embedded in call
records
COMPUTATIONAL SOCIAL SCIENCE
• Field of social science that uses algorithmic tools and large/unstructured data to understand human
and social behavior.
• Complements rather than replaces traditional methodologies: Methods are not the goal but
contribute to data generation.
• Includes methods such as, e.g.,:
▪ Data mining (e.g., scraping and gathering of large data sets)
▪ Software development for social science experiments
▪ Automated text analysis (e.g., sentiment analysis, keyword extraction, dictionary approaches)
▪ Image classification (e.g., face recognition, visual topic modeling)
▪ Machine learning approaches (e.g., for classification, prediction, topic modeling)
, ▪ Actor-based modeling (e.g., simulation of social behavior, spreading of information)
Why is this important now?
• Vast amounts of digitally available data, ranging from social media messages and other digital traces
to web archives and newly digitized newspaper and other historical archives
• Large-scale records (big data) of persons or businesses are created constantly
• Powerful and comparatively cheap processing power, and easy to use computing infrastructure for
processing these data
• Improved tools to analyze this data, including network analysis methods and automatic text analysis
methods such as supervised text classification, topic modeling, word embeddings, as well as large
language models
10 CHARACTERISTICS OF BIG DATA
,PRO’S AND CON’S OF COMPUTATIONAL METHODS
Opportunities
• We can study actual behavior instead of simply self-reports.
• We can study human beings in their social context instead of in an artificial lab setting.
• We can increase our N (higher power).
• Potential to uncover patterns and insights that we couldn’t investigate before.
Pitfalls
• Techniques often (rather) complicated.
• Data is often proprietary (not shared openly).
• Samples are often biased (=vertekend).
• Often, data have only insufficient metadata.
Definition:
Computational communication science is the merging subfield that investigates the use of computational
algorithms to gather and analyze big and often semi- or unstructured data sets to develop and test
communication science theories.
TYPICAL RESEARCH AREAS
Computational communication science studies thus usually involve:
1. large and complex data set
2. consisting of digital traces and other “naturally occurring” data
3. requiring algorithmic solutions to analyze
4. allowing the study of human communication by applying and testing communication theory
• Political Communication
▪ Democratization and Polarization
▪ Hate Speech
• Social Media Use
▪ Tracking of actual social media use
▪ Spreading of behavior, information, or emotions
• Health Communication
o Prevalence of health information online
• (Online) Journalism
▪ News coverage across decaces
▪ Gender equality
EXAMPLE 1: ANALYZING NEWS COVERAGE
• Jacobi and colleagues (2016) analyzed the coverage of nuclear technology from 1945 to 2014 in the
New York Times
• Analysis of 51,528 news stories (headline and lead): Way too much for human coding!
• Used “LDA topic modeling” to extract latent topics and analyzed their occurrence over time
,EXAMPLE 2: FACEBOOK DATA TO PREDIT PERSONALITY
• Kosinski and colleagues (2013) used a dataset of over 58,000 volunteers who provided their Facebook
Likes, detailed demographic profiles, and the results of several psychometric test
• Were able to show that one can predict a variety of personal characteristics and personality traits
from simple Facebook likes
EXAMPLE 3: DUTCH TELEGRAMSPHERE
• Simon et al. (2022) collected the full messaging history (N = 2,033,661) of 174 Dutch-language public
Telegram chats/channels
• Used state-of-the-art web-mining, neural topic modeling, and social network analysis techniques.
• Their findings raise concerns with respect to Telegram’s polarization and radicalization capacity.
• They observed that Telegram users are active in and share content across different communities.
, • They further found that over time, conspiracy-themed, far-right activist, and COVID-19-sceptical
communities dominated
EXAMPLE 4: GENDER STEREOTYPES IN POLITICAL NEWS
• Andrich et al (2023) studied gender differences in political news coverage to determine whether the
media employ stereotypical traits in portrayals of 1,095 U.S. politicians
• The sample consisted of over 5 million U.S. news stories published from 2010 to 2020