Belangrijkste punten CADC tentamen
Definition of Computational communication science (CCS)
is the label applied to the emerging subfield that investigates the use of
computational algorithms to gather and analyze big and often semi- or
unstructured data sets to develop and test communication science
theories.
R produce publication ready figures and visualizations. Allows us to
combine analyses and witing to proecures diverse output formats
R allows us flexible and comprehensive programming
Complex data mangement
Advance analyses with large/messy data
10 characteristics of Big Data
1. Big: The scale or volume of some current datasets is often
impressive. Big datasets are not an end in themselves> data sets
are huge in volume
2. Always on: Many big data systems are constantly collecting data
and thus enable to study unexpected events als allow for real-time
measurement > je data is waardevol als het altijd aan staat, kunnen
soms zelfs analysis maken van dingen die nog niet eens gebeurd zijn
3. Non reactive: Participants are generally not aware that their data is
being captured or they have become so accustomed to this data
collection that it no longer changer their behaviour.Facebook deelt
niet met ons welke data zij opslaan
4. Incomplete: Most big data sources are incomplete in the sense that
they don’t have the information that they you want for your
research. Meeste bigdata bronnen zijn incompleet, in de zin dat ze
geen informatie hebben die je wil voor je onderzoek, deze data werd
voor een ander doel verzameld
5. Inaccesible: Data held by companies is difficult for researches to
access. > data wordt soms door bedrijven of minesteries beheerd
6. Nonrepresentive: Most bigdatasets are nonetheless not
representative of certain populations. > out of sample
generalizations are hence difficult or impossible > meeste big
datasets zijn niet representatief voor bepaalde populaties, out of
sample genereraties zijn moeilijk te meten
7. Drifting: Many bigdata systems are changing constantly, makes it
difficult to study long term trends > veel big data systemen
veranderen continue, maakt het moeilijk voor lange termijn studie
8. Algorithmically confounded: Behaviour in big data systems is not
natural: driven by the enginering goals of the systems > gedrag in
big data systemen is niet naturel, gestuurd door de doelen van het
systeem
9. Dirty: Big data often includes a lot of noise (spam, junk etc) big data
bevat veel noise (junk spam etc)
10. Sensitive: some information of companies and governments
is sensitive
Is big data always a good idea?
,Big data is niet de oplossing voor alle methodelogische problemen en
heeft limitaties
- Big data is found while survey data is made by researcher
- Big data is not always representative for a certain population, wordt
vaak ergens vandaan geplukt
- Significantie (p-waarde) are less meaningful as a measure for validity
Voorbeeld study Kramer et al Facebook
Veel aan deze studie is cool, maar nog meer is niet cool
- Not informed consent
- Not replicable
- Low internatal validity Is sentiment of posts indicative of mood? &
does change in sentiment orginate in contagion of mood
- Low measurment accuracy – are word counts indicative of sentiment?
- Overt manipulation of people’s life
Ethical problems
- Respect for persons
- Beneficence: understanding and improving the risk/benefit profile of a
study
- Justice: risk and benefits should be evenly distributed
- Respect for law and public interest
Typical computational research strategies
1. Counting things in the age of big data, reasercher can ‘count’
more than ever
2. Forecasting and nowcasting big data allow for more accurate
predictions both in the present and future
3. Approximating experiments computaional methods provide
oppurtunities to contact ‘natural experiments’
Promises of computational communication research
The recent acceleration in the use of computational methods for
communication science is primarily fueled by the confluence of at least
three developments:
- vast amounts of digitally available data, ranging from social media
messages and other "digital" traces to web archives and newly digitized
newspaper and other historical archives
- improved tools to analyze this data, including network analysis
methods and automatic text analysis methods such as supervised
text classification, topic modelling, word embeddings and
syntactic methods
- powerful and cheap processing power, and easy to use computing
infrastructure for processing these data, including scientific and
commercial cloud computing, sharing platforms such as Github and
, Dataverse, and crowd coding platforms such as Amazon MTurk and
Crowdflowern semi – or unstructured data sets to develop and test
communication science theories
Challenges of computational communication science
- Data-driven research questions might not be theoretically
interesting > onderzoeksvragen kunnen theoretisch niet interessant
zijn
- Proprietary data threatens accessibility and reproducibility
- Found data is not always representative, threatening external
validity > Gevonden data is niet altijd representatief (bedreiging voor
externe validiteit)
- Computational methods bias and noise threaten accuracy and
internal validity> bias en noise bedreigen accurariteit en interne
validiteit
- Inadequate ethical standards/ procedures
Computational communication science is the label applied to the
emerging subfield that investigates the use of computational algorithms to
gather and analyze big and often semi- or unstructured data sets to
develop and test communication science theories.
Promises of Computational communication research
- Vast amount of digitallly available data
- Improved tools to analyse the data
- Powerful & cheap processing power and easy to use
Advantages & Disadvantages of Computational Methods
Advantages Disadvantages
From self report to real behavior Techniques often
Zo kan er echt gedrag gemeten worden, zonder complicated
dat self report attitudes of intenties in de weg Data often properiety
staan. Zo kan dit helpen bij sociaal wenselijke Data vaak alleen available voor
problemen en is het niet afhankelijk van mensen bepaalde mensen
hun verlangen en intenties Samples often biased
Ook onderliggende menselijke communicatie Insufficient metadata
komt naarboven
Social context vs lab setting
Reactie van mensen zien in een echte
omgeving/dagelijks leven, in plaats van in een
lab setting.
Small N to large N
Meer mensen in een onderzoek zorgt
automatisch ook voor het verklaren van meer
subtiele relaties of effecten in kleinere sub
populaties
From solitary (allen) to collaborative
Digitale data & computer tols maken het
makkelijker om te delen en bronnen her te
gebruiken.
, RR 1 : When communication meets computation: opportunities,
challenges and pitfalls in Computational communication science
Wouter van Atteveldt & Tan Quan-Peng
De rol van computational methods in communicatie wetenschap
de laatste versnelling in het gebruik komt voornamelijk door de toeloop
van drie ontwikkelingen:
veel data beschikbaar, verbeterde analyze tools & powerful & cheap
processing power
1. A deluge of digitally available data, ranging from social media
messages and other“digital traces” toweb archives and newly digitized
newspaper and other historical archives een storvloed aan digitaal
beschikbare date, varierend van social media berichten tot webarchief,
kranten etc
2. Improved tools to analyze this data, including network analysis
methods and automatic text analysis methods such as supervised text
classification topic modelling, word embeddings and syntactic methods
verbeterde tools om deze data te analyseren, netwerkanalysemethodes,
automatische tekst analyse (tekst classification, onderwerp modellering,
word embedding & syntactische methodes)
3. The emergence of powerful and cheap processing power, and
easy to use computing infrastructure for processing these data,
including scientific and commercial cloud computing,
sharing platforms such as Github and Dataverse, and crowd coding
platforms such as
Amazon MTurk and Crowdflower de opkomst van goedkope, krachtige
verwerkingskrachten voor het verwerken van gegevens
Over het algemeen bevatten computational communication
methode studies het volgende:
1. Large & complex data sets
2. Consiting of digital traces and other naturally occurring data
3. Requiring algorithmic solutions to analyse
4. Allowing the study of human communication by appluing and testing
communication theory
Week 2 Data Wrangling & Data visualization
A general model of data science
Import tidy transform visualize model communicate
1. Import data
- Data comes in different forms (two- or
multidimensional, text or numbers...) and formats
(.csv, .txt, .sav, .stata, .html...)
- First, we must find a way to import this data into R
- This typically means that you take data stored in a
file, database, or web application programming
interface (API), and load it into a data frame in R
- Imagine we would have found the following table on wikipedia and
would want to get it into R...
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller teddievdstaak1. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.28. You're not tied to anything after your purchase.