Summary articles CADC
Advancing Automated Content Analysis for a New Era of Media
Effects Research: The Key Role of Transfer Learning (Kroon et al,
2023)
This research article advocates for the application of advanced natural language
processing (NLP) techniques, specifically transfer learning with transformer-
based models like BERT and GPT, to revolutionise automated content analysis in
media effects research. The authors highlight the limitations of traditional
methods, such as bag-of-words models and dictionary-based approaches, in
handling the increased diversity and complexity of individual-level digital trace
data in today's fragmented media landscape. They argue that transfer learning
offers a solution by enabling more accurate and generalisable content
classification across diverse formats, languages, and modalities, thereby
facilitating more robust media effects studies. The article details the advantages
of this approach while acknowledging the challenges, including potential biases
in the large language models, and proposes mitigation strategies.
Dictionary-based analysis involves using pre-defined lists of words to identify
and classify concepts in a text. For instance, a dictionary for "positive sentiment"
might contain words like "happy," "good," and "amazing."
Supervised machine learning, on the other hand, uses labelled data to train
an algorithm to classify text. For example, a researcher might manually label a
set of tweets as "positive," "negative," or "neutral," and then use this data to
train a machine learning model to automatically classify new tweets.
Bag-of-words (BoW) is a method of representing text that ignores word order
and grammar, and focuses only on the frequency of words in a document.
Imagine putting all the words from a document into a bag and then counting how
many times each word appears.
Word embeddings, a type of pre-trained model, can capture semantic
relationships between words, enabling more accurate sentiment classification
even for words not present in the training data. This is achieved by representing
words as continuous vectors, allowing the model to learn from the similarities
between words
Transfer learning, as the name implies, involves "transferring" knowledge from
one task to another. In the context of NLP, this often means taking a model
trained on a large dataset (like all of Wikipedia) and then fine-tuning it for a
specific task. It improves the accuracy and enhances generalizability.
Transformer-based models such as BERT and GPT introduce contextual
knowledge into sentiment analysis by considering the relationships between
words in a sentence
, BERT (Bidirectional Encoder Representations from Transformers) is a
type of transformer-based model pre-trained on a massive dataset of text and
code. It excels at understanding the context of words in a sentence, making it a
powerful tool for tasks like sentiment analysis, question answering, and text
summarization
Experimental evidence of massive-scale emotional contagion
through social networks (Kramer et al, 2014)
This study experimentally demonstrates massive-scale emotional contagion
on Facebook. Researchers manipulated the emotional content of users' News
Feeds, finding that reducing positive content led to fewer positive and more
negative posts, and vice versa. This proves emotional contagion can occur
without direct interaction or nonverbal cues, solely through textual
content. The findings challenge existing assumptions and highlight the
significant influence of online social networks on individual emotional states,
even with small effect sizes, due to the platform's scale.
Emotional contagion refers to the phenomenon where people's emotions and
moods can be influenced by the emotions of others around them. It's like
"catching" someone else's feelings. Imagine feeling happier after spending time
with a cheerful friend, or feeling sad after watching a heartbreaking movie.
Text Analysis in R (Welbers et al, 2017)
This is a teacher's corner article detailing how to perform computational text
analysis using the R statistical software. The authors, experts in communication
research and methodology, highlight R's open-source nature and extensive
package library as key advantages for researchers. The article systematically
guides readers through the process, from data preparation (importing,
cleaning, and creating a document-term matrix) to analysis (using
dictionary methods, supervised and unsupervised machine learning,
and descriptive statistics), providing example code and illustrating various R
packages. Finally, the authors discuss advanced techniques like using external
NLP modules and incorporating word positions, advocating for greater
transparency and citation of R packages in research.
The Validity of Sentiment Analysis: Comparing Manual
Annotation, Crowd-Coding, Dictionary Approaches, and Machine
Learning Algorithms (Van Atteveldt et al, 2021)
This research paper rigorously compares various methods for sentiment analysis,
focusing on the accuracy of determining positive, negative, or neutral sentiment
in Dutch economic news headlines. The study contrasts manual annotation by
trained students, crowd-coding via online platforms, dictionary-based
approaches using numerous lexicons (both Dutch and English ones translated via