100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Computational Analysis and Digital Communication | Samenvatting | Slides & Alle Literatuur! €8,49   In winkelwagen

Samenvatting

Computational Analysis and Digital Communication | Samenvatting | Slides & Alle Literatuur!

 14 keer bekeken  1 keer verkocht

Een to-the-point samenvatting van de literatuur en lecture slides van het vak Computational Analysis and Digital Communication (VU Amsterdam).

Voorbeeld 4 van de 34  pagina's

  • 29 november 2022
  • 34
  • 2022/2023
  • Samenvatting
Alle documenten voor dit vak (1)
avatar-seller
annemelinde
1. INTRODUCTION TO COMPUTATIONAL METHODS

10 CHARACTERISTICS OF BIG DATA
(1) Big, (2) Always-on, (3) Non-reactive, (4) Incomplete, (5) Inaccessible, (6) Non-representative, (7) Drifting: many big data
systems are changing constantly, making it difficult to study long-term trends, (8) Algorithmically confounded: behavior in big
data systems is not natural: it is driven by the engineering goals of the systems, (9) Dirty, (10) Sensitive.


ETHICAL PROBLEMS WITH COMPUTATIONAL METHODS
1. Data collection without awareness/consent
2. Manipulation without awareness/consent
3. Data potentially sensitive, individual users identifiable


GUIDING PRINCIPLES
1. Respect for people. Treating people as autonomous and honoring their wishes.
2. Beneficence. Understanding and improving the risk/benefit profile of a study.
3. Justice. Risks and benefits should be evenly distributed.
4. Respect for law and public interest.


CHALLENGES OF COMPUTATIONAL COMMUNICATION SCIENCE
1. Simply data-driven research questions might not be theoretically interesting
2. Proprietary data threatens accessibility and reproducibility
3. ‘Found’ data not always representative, threatening external validity
4. Computational method bias and noise threaten accuracy and internal validity
5. Inadequate ethical standards/procedures



L1R1. WHEN COMMUNICATION MEETS COMPUTATION: OPPORTUNITIES, CHALLENGES, AND
PITFALLS IN COMPUTATIONAL COMMUNICATION SCIENCE
Wouter van Atteveldt and Tai-Quan Peng (2018)


THE ROLE OF COMMUNICATION METHODS IN COMMUNICATION SCIENCE
The recent acceleration in the promise and use of computational methods for communication science is primarily fueled by the
confluence of at least three developments: (1) a deluge (= stortvloed) of digitally available data, (2) Improved tools to analyze
this data, and (3) the emergence (= opkomst) of powerful and cheap processing power, and easy to use computing infrastructure
for processing these data, including scientific and commercial cloud computing, sharing platforms, and crowd coding platforms.
Computational communication science studies generally involve: (1) large and complex data sets; (2) consisting of digital traces
and other ‘naturally occurring’ data; (3) requiring algorithmic solutions to analyze; and (4) allowing the study of human
communication by applying and testing communication theory.

,FOUR OPPORTUNITIES OFFERED BY COMPUTATIONAL METHODS
1. From self-reported behavior to real behavior. Allow us to measure actual behavior in an unobtrusive way.
2. From lab experiments to studies of the actual social environment.
3. From small-N to large-N. Enables us to study more subtle relations or effects in smaller subpopulations.
4. From solitary to collaborative research. Sharing data and tools will force us to be more rigorous in defining
operationalizations and documenting the data and analysis process, furthering transparency, and reproducibility.


FIVE CHALLENGES AND PITFALLS IN COMPUTATIONAL METHODS
1. How do we keep research datasets accessible? We need to make sure our data is open and transparent.
2. Is big data always good data? The gap between the primary purpose intended for big data and the secondary purpose
will pose a threat to the validity of design, measurement, and analysis in computational communication research.
Besides that, data being ‘big’ does not mean that it is representative for a certain population. ‘Specialized’ actors on
social media are over-represented while the ordinary publics are under-represented, which leads to a sampling bias.
This means that p-values are less meaningful as a measure of validity. For big data studies, we should focus more on
substantive effect size and validity than mere statistical significance.
3. Are computational measurement methods valid and reliable? Quantitative measures of topic coherence do not always
correlate with human judgments of topic quality, introducing systematic biases in subsequent multivariate analysis and
threatening the validity of statistical inference.
4. What is responsible and ethical conduct in computational communication research?
5. How do we get the needed skills and infrastructure? Invest and move to a culture of sharing tools and data.


CONCLUSION
It’s vital that - as a community - we move forward on at least three fronts: (1) build the infrastructure, skills, and institutional
incentives required to use and maintain computational methods and tools; (2) work toward open, transparent, and collaborative
research, with sharing and reusing datasets and tools the norm rather than the exception; and (3) continue developing, validating,
and critically discussing computational methods in the context of substantive communication science questions.



L1R2. EXPERIMENTAL EVIDENCE OF MASSIVE-SCALE EMOTIONAL CONTAGION THROUGH
SOCIAL NETWORKS
Adam D. I. Kramera, Jamie E. Guillory and Jeffrey T. Hancock (2013)


EXPERIMENT
The experiment manipulated the extent to which people (N = 689,003) were exposed to emotional expressions in their News
Feed. This tested whether exposure to emotions led people to change their own posting behaviors, in particular whether
exposure to emotional content led people to post content that was consistent with the exposure - thereby testing whether
exposure to verbal affective expressions leads to similar verbal expressions, a form of emotional contagion. Two parallel
experiments were conducted for positive and negative emotion: One in which exposure to friends’ positive emotional content in
their News Feed was reduced, and one in which exposure to negative emotional content in their News Feed was reduced.

,RESULTS
The results show emotional contagion. For people who had positive content reduced in their News Feed, a larger percentage of
words in people’s status updates were negative and a smaller percentage were positive. When negativity was reduced, the
opposite pattern occurred. This suggests that the emotions expressed by friends, via online social networks, influence our own
moods. We also observed a withdrawal effect. People who were exposed to fewer emotional posts (of whatever valence) in their
News Feed were less expressive overall on the following days. This observation, and the fact that people were more emotionally
positive in response to positive emotional updates from their friends, stands in contrast to theories that suggest viewing positive
posts by friends on Facebook may somehow affect us negatively, for example, via social comparison.



2. BASICS OF AUTOMATIC TEXT ANALYSIS AND DICTIONARY APPROACHES

WHAT IS TEXT?
Text consists of symbols. Symbols by themselves do not have meaning. A symbol itself is a mark, sign or word that indicates,
signifies or is understood as representing an idea, object or relationship. Symbols thereby allow people to go beyond what is
known or seen by creating linkages between otherwise very different concepts and experiences. Text (a collection of symbols)
only attains meaning when interpreted (in its context). The main challenge in automatic text analysis is to bridge the gap from
symbols to meaningful interpretation.


STEPS IN AUTOMATIC TEXT ANALYSIS




OBTAINING TEXTS (1)

• Publicly available datasets, e.g. political texts, news from publisher / library
→ Great if you can find it, often not available.
• Scraping primary sources, e.g. press releases from party websites, existing archives.
→ Writing scrapers can be trivial or very complex depending on the website. Make sure to check legal issues.

• Proprietary texts from third parties, e.g. digital archives (LexisNexis), social media APIs
→ Often customer format, API restrictions, API changes. Terms of use not conducive to research, sharing.


FROM TEXT TO DATA (2)
Algorithms (or generally R) process numbers, they do not read text. The first step in any analysis is to convert text to a series of
numbers. This is done through a number of optional steps (also known as preprocessing), which include tokenization, removing
stopwords, stemming or lemmatization, normalization and frequency trimming. The resulting structured text is then often used
to create a document-feature matrix (DTM). This is a table containing the frequency of each word in each document. This is called
the ‘bag of words’ approach and ignores word order.

, TOKENIZATION
First, we need to break text down into the features that we want to analyze (tokens). We take an input (a string) and a token type
(a meaningful unit of text, e.g. a word) and split the input (string) into pieces (tokens) that correspond to the type. Types of
tokens. Thinking of a token as a word is a useful start (and most used approach → bag-of-words). However, we can generalize
the idea of a token beyond only a single word to other units of texts: characters (I, l, o, v, e, y, o, u), words (I, love, you), sentences
(I love you), lines (He went to her. I love), paragraphs, and n-grams (I love).


TEXT CLEANING, STEMMING, LEMMATIZING
Text contains a lot of noise, e.g. very uncommon words, spelling, scraping mistakes (HTML code), stop words (a, the, I, will),
conjugations of the same word (want, wants), near synonyms (wants, loves). Remember, what noise is depends on your research
questions. Cleaning steps needed to reduce noise are removing unnecessary symbols (e.g., punctuations, numbers), removing
stopwords (e.g., a, the), normalization (transform to lowercase), stemming (wants → want) OR lemmatizing (ran → run) and
frequency trimming (removing rare words).


DOCUMENT-FEATURE MATRIX (DFM)
Finally, we create a representation of these ‘tokens’ (or terms or features). A DFM refers to a mathematical matrix that describes
the frequency of terms that occur in a collection of documents. In this matrix, each row corresponds to a document in the
collection and columns correspond to terms. Each cell contains the number of times a term is in that particular document.


ANALYZING THE STRUCTURED DATA (3)

• Rule-based analyses (deductive approaches).
o The meaning is assigned by the researcher.
o ‘’If word X occurs, the text means Y.’’

• Supervised machine learning (inductive approaches).
o Train a model on coded training examples. Generalizes meaning in human coding of training material
o ‘’Text X is like other texts that were negative, so X is probably negative.’’

• Unsupervised machine learning (inductive approaches).
o Find clusters of words that co-occur. Meaning is assigned afterwards by the researcher interpretation
o ‘’These words form a pattern, which I think means X’’


EVALUATING THE VALIDITY OF THE ANALYSIS (4)
Many text analysis processes are ‘black boxes’, even manual coding. Dictionaries are ultimately opaque. The computer does not
‘understand’ natural language. You need to prove to the reader that the analysis is valid. Validate by comparing to known goods.
Comparison: often manual annotation of ‘gold standard’.


WHAT ARE DEDUCTIVE APPROACHES?
Code rules are set priori based on a predefined ‘text theory’. Computers use these rules to decode text. Rules can differ based on
individual words or group of words (e.g., articles that contain ‘government’ are coded as ‘politics’), based on patterns (e.g., the
sender of a mail can be identified by looking for ‘FROM:’) or a combination of both.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper annemelinde. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €8,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 72042 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€8,49  1x  verkocht
  • (0)
  Kopen