Computationele analyse van digitale communicatie (S_MPC)
All documents for this subject (2)
Seller
Follow
Sterrevermond
Content preview
Lecture 1 - Introduction to Computational Methods - 31/10/2022
Computational Social Science: Field of Social Science that uses algorithmic tools and
large/unstructured data to understand human and social behavior. Computational methods
as “microscope”: Methods are not the goal, but contribute to theoretical development and/or
data generation. Complements rather than replaces traditional methodologies. Includes
methods such as:
★ Advanced data wrangling/data science
★ Combining of different data sets
★ Automated Text Analysis
★ Machine Learning (supervised and unsupervised)
★ Actor-based modeling
★ Simulations
Typical Workflow
Why is this important now?
★ In the past, collecting data was expensive (surveys, observations…).
★ In the digital age, the behaviors of billions of people are recorded, stored, and
therefore analyzable.
★ Every time you click on a website, make a call on your mobile phone, or pay for
something with your credit card, a digital record of your behavior is created and
stored.
★ Because (meta-)data are a byproduct of people’s everyday actions, they are often
called digital traces.
★ Large-scale records of persons or businesses are often called big data.
,10 characters of big data
Big The scale or volume of some current datasets is often impressive.
However, big datasets are not an end in themselves, but they can
enable certain kinds of research including the study of rare events, the
estimation of heterogeneity, and the detection of small differences.
Always-on Many big data systems are constantly collecting data and thus enable
them to study unexpected events and allow for real-time measurement.
Nonreactive Participants are generally not aware that their data are being captured
or they have become so accustomed to this data collection that it no
longer changes their behavior.
Incomplete Most big data sources are incomplete, in the sense that they don’t
have the information that you will want for your research. This is a
common feature of data that was created for purposes other than
research.
Inaccessible Data held by companies and governments are difficult for researchers
to access.
Non Most big datasets are nonetheless not representative of certain
Representative populations. Out-of-sample generalizations are hence difficult or
impossible.
Drifting Many big data systems are changing constantly, thus making it difficult
to study long-term trends.
Algorithmically Behavior in big data systems is not natural; it is driven by the
confounded engineering goals of the systems.
Dirty Big data often includes a lot of noise (e.g., junk, spam, spurious data
points…)
Sensitive Some of the information that companies and governments have is
sensitive.
Example: Smartphone log data:
★ Big: Thousands of rows per person, but not many columns.
★ Always-on: Recorded smartphone use at all times.
★ Incomplete: Did not record app use with higher privacy standards
★ Dirty: Depending on what you want to study, lots of noise.
,Typical computational research strategies
1. Counting things: In the age of big data, researcher can “count” more than ever
- How often do people use their smartphone per day?
- About which topics do news websites write most often?
2. Forecasting and nowcasting: Big data allow for more accurate predictions both in
the present and in the future
- Investigate when people disclose themselves in computer-mediated
communication
- Crime prediction
3. Approximating experiments: Computational methods provide opportunities to
conduct “natural experiments”
- Compare smartphone log data of people who use their smartphone naturally
vs. those who abstain from certain apps (e.g., social media apps)
- Investigate the potential of nudges to make users select certain news
Advantages and disadvantages
★ Advantages of Computational Methods: Actual behavior versus self-report, social
context versus lab setting, small N versus large N.
★ Disadvantages of Computational Methods: Techniques often complicated, data often
proprietary, samples often biased, insufficient metadata.
Computational Communication Science (CCS): the label applied to the emerging subfield
that investigates the use of computational algorithms to gather and analyze big and often
semi- or unstructured data sets to develop and test communication science theories.
Promises
The recent acceleration in the use of computational methods for communication science is
primarily fueled by the confluence of at least three developments:
★ vast amounts of digitally available data, ranging from social media messages and
other digital traces to web archives and newly digitized newspaper and other
historical archives.
★ improved tools to analyze this data, including network analysis methods and
automatic text analysis methods such as supervised text classification, topic
modeling, word embeddings, and syntactic methods.
★ powerful and cheap processing power, and easy to use computing infrastructure for
processing these data, including scientific and commercial cloud computing, sharing
platforms such as Github and Dataverse, and crowd coding platforms such as
Amazon MTurk and Crowdflower.
Ethical problems with computational methods
★ More power over participants than in the past
- Data collection without awareness/consent
- Manipulation without awareness/consent
- Data potentially sensitive, individual users identifiable
★ Guiding principles:
- Respect for persons: Treating people as autonomous and honoring their
wishes.
, - Beneficence: Understanding and improving the risk/benefit profile of a study.
- Justice: Risks and benefits should be evenly distributed.
- Respect for law and public interest
Challenges of computational communication science
★ Simply data-driven research questions might not be theoretically interesting
★ Proprietary data threatens accessibility and reproducibility
★ ‘Found’ data not always representative, threatening external validity
★ Computational method bias and noise threaten accuracy and internal validity
★ Inadequate ethical standards/procedures
Preliminary summary
★ Computational communication research holds manifold promises.
★ We can harness unusual sources of information and large amounts of data,
particularly because people constantly leave digital traces.
★ New methods allow to structure, aggregate and make sense of these data and
extract meaningful information to study communication behavior and phenomena.
★ However, computational communication research comes with ethical challenges
related to consent, privacy, and autonomy of the participants.
Example exam question (MC)
Why is the “Facebook Manipulation Study” by Kramer et al. ethically problematic?
A. People didn't know that they took part in a study (no informed consent)
B. It overly manipulated people’s emotion
C. Both A and B are true
D. The study was not ethically problematic
Example exam question (Open format)
Name and explain two characteristics of big data.
1. Big data is often “incomplete”: This means they do not have the information that you
will want for your research. This is a common feature of data that was created for
purposes other than research. For example, log data (e.g., browser history) includes
all links a person has visited over time, but does not provide any additional
information. Moreover, it may contain gaps where the software failed or the person
purposefully hid his surfing behavior.
2. Big data is often “algorithmically confounded”: Behavior in big data systems is not
natural; it is driven by the engineering goals of the systems. For example, what you
see on a facebook news feed depends on an algorithm that Facebook has built into
their platform. Behavior of individuals is thus also driven by these system-immanent
features.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Sterrevermond. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $5.89. You're not tied to anything after your purchase.