Class notes

LECTURE SUMMARY - Computation Analysis of Digital Communication + EXAM QUESTIONS

Name: LECTURE SUMMARY - Computation Analysis of Digital Communication + EXAM QUESTIONS
SKU: doc_2123504
Rating: 4.00 (1 reviews)
Author: pikayichu

Rating

4.0

(1)

Sold

Pages

Uploaded on

21-11-2022

Written in

2022/2023

LECTURE SUMMARY - Computation Analysis of Digital Communication lecture notes + EXAM QUESTIONS. Please don't mind the English gramma since I wanted to make it as compact as possible :) (from 51 pages to 19 pages!!)

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Vrije Universiteit Amsterdam (VU)
Study: Master Communicatie Wetenschap
Course: Computational Analysis Of Digital Communication

All documents for this subject (2)

Document information

Uploaded on: November 21, 2022
File latest updated on: November 22, 2022
Number of pages: 19
Written in: 2022/2023
Type: Class notes
Professor(s): Masur
Contains: All classes

Subjects

cadc
computational sciences
machine learning
social science
big data
search queries
dictionary
studio r
text analysis
content analysis
prediction
validity
deductive approaches
lexical analysis
dicti
r

Content preview

CADC 2022 - KAYI MAN

LECTURE 1 - What is computational social sciences? 2
PRELIMINARY SUMMARY 4

LECTURE 2 - Text as Data - Basics of Automatic Text Analysis 4
Text as Data - How can we analyze texts with computers? 4
Automated Text Analysis 5
Automatic text analysis steps - van Atteveldt, Welbers, & Van der Velden, 2019 5
Deductive Approaches: Dictionary-based Text Analysis 6
Example - state of union speech corpus 6

LECTURE 3 - Machine Learning - Supervised Text Classification 8
What is machine learning? 8
Supervised text classification 9
Principles of supervised text classification 9
Validation 11
Example - prediction musing genre from lyrics (homework 2/3A) 11
Conclusion 12

LECTURE 4 - Machine Learning - Unsupervised Topic Modeling 13
What is topic modeling? - based on example: nuclear technology from 1945 - 2013 13
Topic Modeling as Dimensionality Reduction 14
Latent Dirichlet Allocation - LDA topic modeling 14
Conclusion and outlook 18
Conclusion 18

EXAMPLE EXAM QUESTION (MULTIPLE CHOICE) 23

EXAMPLE EXAM QUESTION (OPEN FORMAT) 23

, CADC 2022 - KAYI MAN

LECTURE 1 - What is computational social sciences?
● Field of Social Science that uses algorithmic tools and large/unstructured data to understand
human and social behavior
● Computational methods as “microscope”: Methods are not the goal, but contribute to
theoretical development and/or data generation
● Complements rather than replaces traditional methodologies
● Includes methods such as, e.g.,:
○ Advanced data wrangling/data science
○ Combining of different data sets
○ Automated Text Analysis
○ Machine Learning (supervised and unsupervised)
○ Actor-based modeling
○ Simulations
○ …
● To better understand text, results. “how do we understand large data set”
● How can we work with data this large
● To big to put in excel-sheet
● Unsupervised vs. Supervised

TYPICAL WORKFLOW
1. Identification problem/purpose =
2. Data acquisitions = Different way to get existing data
3. Data wrangling = What do i have to transfor, add, delete the data to make it useful
4. Data analysis & modeling = statistical analysis or creating algorithms
5. Reporting = using data to communicate …. ?

WHY IS THIS IMPORTANT NOW?
- Collecting data used to be expensive (surveys, observations)
- Digital age: behaviors of billions are recorded, stored and therefore analyzable
- Digital record of behavior is created by everytime/thing you click/call/pay
- (meta-)data are byproduct of peeps everyday actions aka digital traces
- Big data = often called large-scale records of peeps/businesses

10 CHARACTERISTICS OF BIG DATA (Salganik, 2017, chap. 2.3)
1. Big = scale / volume of current datasets is often impressive
2. Always-on = big data systems are constantly collecting data (FB = always-on)
3. Non reactive = subjects are non reactive and not aware of the collecting (ethical?) of zijn zo
gewend dat het hun behavior niet veranderd
4. Incomplete = most big data sources are incomplete, don't have info that you want to
research. Because data was created for other purposes than research.
5. Inaccessible = Data held by companies/governments are difficult for researchers to access.
6. Non representative = not representative of certain populations
7. Drifting = systems are changing constantly, difficult for long-term study trends. The way they
do it, changes
8. Algorithmically confounded = behavior in big data is not natural; driving by engineering
goals. Weird algorithm implemented by FB, predetermines how the data is gonna look like.
Record produced by the system that is built by platform.
9. Dirty = Big data includes noise (junk, spam)
10. Sensitive = some info that companies/governments have, are sensitive (ethical?)
Privacy issues

, CADC 2022 - KAYI MAN

TYPICAL COMPUTATIONAL RESEARCH STRATEGIES
1. Counting things (how often do peep use phones per day? What topics do news sites cover most?
2. Forecasting and nowcasting (predictions both present and in future; crime prediction…)
3. Approximating experiments (investigate potential nudges to make user select certain news)

ADVANTAGES AND DISADVANTAGES
Advantages of Computational Methods
- Actual behavior vs. self-report (because biased)
- Social context vs. lab setting
- Small N to large N
Disadvantages of Computational Methods
- Techniques often complicated
- Data often proprietary (=eigendomsrecht)
- Samples often biased
- Insufficient metadata (we have data but don’t know who they are)

DEFINITION Van Atteveldt & Peng, 2018
“Computational Communication Science is the
- label applied to the emerging subfield that investigates
- the use of computational algorithms
- to gather and analyze big and often semi- or unstructured data sets
- to develop and test communication science theories”

PROMISES
Three developments fueled the computational methods of communication sciences
1. Vast amounts of digitally available data
2. Improved tools to analyze big data (auto text analysis methods) changes fast!
3. Powerful and cheap processing power & easy computing infrastructure (Github)

ETHICS OF ‘BIG DATA’ AND COMPUTATIONAL RESEARCH
THE “FACEBOOK MOOD MANIPULATION” STUDY (Kramer et al., 2014)
● Massive online experiment (N ~ 700k)
● Main Research Question: Is emotion contagious?
● Experimental groups: positive / negative / control
● Stimulus: Hide (negative / positive / random) messages from FB timeline
● Measurement / dependent variables: sentiment of posts by user

Question: Do you think these studies are problematic? If yes, why?
● No consent is given
● Shared with third party

COMPUTATIONAL TECHNIQUE: SENTIMENT ANALYSIS
● Count occurrences of words in both categories, subtract negative
from positive

Positive words reduced in feed = more negative words used
Negative words reduced in feed = more positive words used

IS THIS GOOD SCIENCE? WHY NOT?
● What’s cool?
○ Potentially interesting research question
○ actual behavior measured as well as self-report measures
● What’s not so cool? A lot…
○ No informed consent, not replicable, manipulation
○ Low internal validity
■ Is sentiment of posts indicative of mood?
■ Does change in sentiment originate in contagion of mood?
○ Low measurement accuracy
■ Are word counts indicative of sentiment?

$7.65

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

pikayichu

4.5

(4)

Reviews from verified buyers

Showing all reviews

julietteessink Communicatiewetenschap · 6 reviews

3 year ago

4.0

1 reviews

Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

pikayichu Hogeschool van Amsterdam

View profile

Sold

Member since

8 year

Number of followers

Documents

Last sold

8 months ago

4.5

4 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller pikayichu. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.65. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 47909 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

LECTURE SUMMARY - Computation Analysis of Digital Communication + EXAM QUESTIONS

Written for

Document information

Subjects

Content preview

Reviews from verified buyers

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?