Computational social science
Hoorcollege 1
What is it?
Rwanda example. Being able to produce a high-resolution map.
Crime prediction. Evaluating patterns fo past crime incidents together with geographic info,
and this combination serves as a prediction for crime. It reduced crime for 40% in Memphis.
You use an algorithm to predict somethings.
Computational social science
Field of social science that uses algotrithmic tools and large/unstructed date to understand
human/social behaviour. Methods are not the goal, but contributre to theoretical development.
It rather complements than replacing traditional methodologies.
Why is it so important?
- In the past collecting data was expensive
- In the digital age, the behaviors of billions of people are recorded, stored and therefore
analyzable
- Every time you click on a website, make a call on your phone or pay for something with
your creditcard, a digital record of behavior is stored
- Because data are a byproduct of people’e everyday actions, there are often called digital
traces
- Large scale records of persons or business are often called big data.
10 characteristics of big data
,2: something we can make analysis that are constantly updated, this is not the case with social
science as we are used to it, this is often old data.
3: data is being collected without people knowing they are being interviewd
4: technical issues, is not how you really wanted. Things could also be deleted, by a company
for example. You have less control with big data. With normal social science you can ask
what you want to know, but with this form you are mostly dependent of the data.
5: Facebook for example is not sharing their data.
7: updates can mean you there is no access to some data
9: you want to get the data ready for analysis, but this means you have to get rid of the noise.
10: confidential information.
Typical computational research strategies
1: Counting things: in the age of big data, researcher can “count” more than ever.
2: Forcasting and nowcasting: big data for more accurate predictions both in the present and
in the future. Rwanda and crime example is prediction.
3: Approximating experiment: computational methods provide opportunities to conduct
‘natural experiments’.
Advantages:
- Actual behaviour vs. self report
- Social context vs. lab setting
- Small N to large N
Disadvantages:
- Techniques often complicated
- Data ofter proprietary
- Samples often biased
- Insufficient metadata
Computational communication science definition
,"Computational Communication Science (CCS) is the label applied to the emerging subfield
that investigates the use of computational algorithms to gather and analyze big and often
semi- or unstructured data sets to develop and test communication science theories"
Promises of computational communication research
The recent acceleration in the use of computational methods for communication science is
primarily fueled by the confluence of at least three developments:
vast amounts of digitally available data, ranging from social media messages and other
"digital" traces to web archives and newly digitized newspaper and other historical
archives.
improved tools to analyze this data, including network analysis methods and automatic
text analysis methods such as supervised text classification, topic modelling, word
embeddings, and syntactic methods
powerful and cheap processing power, and easy to use computing infrastructure for
processing these data, including scientific and commercial cloud computing, sharing
platforms such as Github and Dataverse, and crowd coding platforms such as Amazon
MTurk and Crowdflower
Ethical problems with computational methods
More power over participants than in the past
Guiding principles
o Respect for persons: Treating people as autonomous and honoring their wishes
o Beneficence: Understanding and improving the risk/benefit profile of a study
o Justice: Risks and benefits should be evenly distributed
o Respect for law and public interest
Challenges of computational communication science
Data-driven research questions might not be theoretically interesting
Proprietary data threatens accessibility and reproducibility
‘Found’ data not always representative, threatening external validity
Computational method bias and noise threaten accuracy and internal validity
Inadequate ethical standards/procedures
Preliminary summary
Computational communication research holds manifold promises
, We can harness unusual sources of information and large amounts of data,
particularly because people constantly leave digital traces
New methods allow to structure, aggregate and make sense of these data
and extract meaningful information to study communication behavior and
phenomena
However, computational communication research comes with ethical
challenges related to consent, privacy, and autonomy of the participants
Hoorcollege 2
Why is the life expected graph powerful? You can visually give a lot of information about this
instead of just giving the information in words.
Problem: how to transform data to produce good visualizations?
From raw data tidy data
Import the data (can be difficult, we need to clean the data, transform data. Transform it in a
way that makes sense.
We try to understand the data and start to visualize this. We do this so many time until we can
communicate the data.
Importing data:
- Data comes in different forms (two- or multidimensional, text or numbers...) and formats