Lecture 1: Van Attenveldt & Peng (2018). When Communication Meets Computation
Opportunities Challenges and Pitfalls in Computational Communication Science
The recent increase in digitally available data, tools, and processing power is fostering
(bevordert) the use of computational methods to the study of communication. This special
issue discusses the validity of using big data in communication science and showcases
a number of new methods and applications in the fields of text and network analysis.
Computational methods have the potential to greatly enhance the scientific study of
communication because they allow us to move towards collaborative large-N studies of
actual behavior in its social context. This requires us to develop new skills and
infrastructure and meet the challenges of open, valid, reliable, and ethical “big data”
research.
The role of computational methods in communication science
The recent acceleration in the promise and use of computational methods for communication
science is primarily fueled by the confluence of at least three developments:
1. A deluge of digitally available data, ranging from social media messages and other
“digital traces” to web archives and newly digitized newspaper and other historical
archives (veel beschikbare digitale data);
2. Improved tools to analyze this data, including network analysis methods and
automatic text analysis methods such as supervised text classification, topic, word
embeddings, and syntactic methods (verbeterde tools om deze gegevens te
analyseren);
3. The emergence of powerful and cheap processing power, and easy to use computing
infrastructure for processing these data, including scientific and commercial cloud
computing, sharing platforms such as Github and Dataverse, and crowd coding
platforms such as Amazon MTurk and Crowdflower (krachtige en goedkope
processorvermogen/rekenkracht en gebruiksvriendelijke
computerinfrastructuur voor het verwerken van deze gegevens).
Computational communication science studies generally involve:
1. Large and complex data sets (grote en complexe datasets);
2. Consisting of digital traces and other “naturally occurring” data (bestaande uit
digitale sporen en andere “natuurlijk voorkomende” gegevens);
3. requiring algorithmic solutions to analyze (vereiste algoritmische oplossingen om
te analyseren);
4. Allowing the study of human communication by applying and testing communication
theory (communicatietheorie toepassen en testen)
Of course, computational methods do not replace the existing methodological approaches,
but rather complement it. Computational methods are an expansion and enhancement of
the existing methodological toolbox, while traditional methods can also contribute to the
development, calibration, and validation of computational methods. Moreover, the distinction
between “classical” and “computational” methods is often one of degree rather than of kind,
and the boundaries between approaches are fuzzy: When does an online experiment turn
into a computational analysis, and how do Facebook status updates really differ from self-
reports? Nevertheless, the term computational methods is useful to make us realize that new
datasets and processing techniques offer us possibilities beyond just scaling up our
previous work; and to alert us to the potential challenges, pitfalls, and required
expertise in using these methods.
Opportunities offered by computational methods
We argue that computational methods allow us to analyze social behavior and
communication in ways that were not possible before and have the potential to radically
change our discipline at least in four ways:
, • From self report to real behavior. Digital traces allow us to measure actual
behavior in an unobtrusive way rather than self-reported attitudes or intentions. This
can help overcome social desirability problems, and more importantly it does not
rely on people’s imperfect estimate of their own desires and intentions.
• From lab experiments to studies of the actual social environment. We can
observe the reaction of persons to stimuli in their actual environment rather than in
an artificial lab setting. In their daily lives, people are exposed to a multitude of
stimuli simultaneously, and their reactions are also conditioned by how a stimulus fits
into the overall perception and daily routine of people. Moreover, we are mostly
interested in social behavior, and how people act strongly depends on their
(perception of) actions and attitudes in their social network.
• From small-N to large-N. Simply increasing the scale of measurement can also
enable us to study more subtle relations or effects in smaller subpopulations than
possible with the sample sizes normally available in communication research. For
example, the Facebook “voting study” showed that a stimulus message to vote also
affects close friends of the people who received the message, but this effect was so
small that it was only significant because of the half a million subjects—but given the
small margins in (American) elections even such a small effect can be decisive (dit
kleine effect kan doorslaggevend zijn). Similarly, by measuring messages and
behavior in real time rather than in daily or weekly (or yearly) surveys, much more
fine-grained time series can be constructed, alleviating the problems of
simultaneous correlation and making a stronger case for finding causal mechanisms.
• From solitary to collaborative research. Digital data and computational tools make
it easier to share and reuse resources. The increased scale and complexity also
make it almost necessary to do so: it is very hard for any individual researcher to
possess the skills and resources needed do do all the steps of computational
research him or herself. An increased focus on sharing data and tools will also force
us to be more rigorous in defining operationalizations and documenting the data and
analysis process, furthering transparency and reproducibility of research. A second
way in which computational methods can change the way we do research is by
fostering the interdisciplinary collaboration needed to deal with larger data sets and
more complex computational techniques.
Challenges and pitfalls in computational methods
By observing actual behavior in the social environment, and if possible of a whole network
of connected people, we get a better measurement of how people actually react, rather than
of how they (report or intent to) react in the artificial isolation of the lab setting; and the
scale at which this is possible allows more complex or subtle causal relations to be tested
and discovered. Large-scale exploratory research can help formulate theories and identify
interesting cases or subsets for further study, while at the same time smaller and
qualitative studies can help make sense of the results of big data research. Similarly, “big
data” confirmatory research can help test whether causal relations found in experimental
studies actually hold in the “wild”, i.e., on large populations and in real social settings.
Using these new methods and data sets, however, creates a new set of challenges and
pitfalls, some of which will be reviewed below.
• How do we keep research datasets accessible? (bevoorrechte toegang tot data)
Although the volume, variety, velocity (snelheid), and veracity (realistisch) of big data
has been repeatedly bragged, it is a hard truth that many of the “big data” sets are
proprietary ones which are highly demanding to access for most communication
researchers (big data-sets hebben eigendomsrechten > lastig om toegang te
krijgen). The privileged access to big data by a small group of researchers will make
researchers with the access “enjoy an unfair amount of attention at the expense of
equally talented researchers without these connections”. Moreover, studies