100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Practical sessions Computational Analysis of Digital Communication €5,49
In winkelwagen

Samenvatting

Summary Practical sessions Computational Analysis of Digital Communication

 43 keer bekeken  3 keer verkocht

Practical session summary of Computational Analysis of Digital Communication given at Vrije Universiteit Amsterdam. S_CADC

Voorbeeld 4 van de 70  pagina's

  • 29 november 2022
  • 70
  • 2022/2023
  • Samenvatting
Alle documenten voor dit vak (13)
avatar-seller
Vustudentt
Basics
<- is the same as =

Basic data types in R
Numeric – Numbers
Character – text
Factor – categorical data
Logical – true or false

Number can be expressed as character, but text cannot be expressed as numerical

Vector – sequence of one/more values of the same data type (= variable)
- Can have any type of data

Data frame is a collection of vectors with the same length, tied together as columns

A function has the form: output <- function_name(argument1, argument2, ...)
- function_name is a name to indicate which function you want to use. It is followed by parentheses.
- arguments are the input of the function, and are inserted within the parentheses. Arguments can
be any R object, such as numbers, strings, vectors and data.frames. Multiple arguments can be
given, separated by commas.
- output is anything that is returned by the function, such as vectors, data.frames or the results of a
statistical analysis. Some functions do not have output, but produce a visualization or write data to
disk.

The purpose of a function is to make it easy to perform a (large) set of (complex) operations. This is crucial,
because
- It makes code easier to understand. You don’t need to see the operations, just the name of the
function that performs them
- You don’t need to understand the operations, just how to use the function




1

,Week 1- Practical session 1
R Tidyverse - Data transformation & summarization


Introduction
The goal of this practical session is to get you acquainted with the Tidyverse and to learn how to transform
and summarize data. Tidyverse is a collection of packages that have been designed around a singular and
clearly defined set of principles about what data should look like and how we should work with it. It comes
with a nice introduction in the R for Data Science book, for which the digital version is available for free.
This tutorial deals with most of the material in chapter 5 of that book.

In this part of the tutorial, we’ll focus on working with data using the tidyverse package. This package
includes the dplyr (data-pliers) packages, which contains most of the tools we’re using below, but it also
contains functions for reading, analyzing and visualizing data that will be explained later.

Installing tidyverse
As before, install.packages() is used to download and install the package (you only need to do this once on
your computer) and library() is used to make the functions from this package available for use (required
each session that you use the package).
install.packages("tidyverse")
library(tidyverse)

Tidyverse basics
As in most packages, the functionality in dplyr is offered through functions. In general, a function can be
seen as a command or instruction to the computer to do something and (generally) return the result. In
the tidverse package dplyr, almost all functions primarily operate on data sets, for example for filtering and
sorting data.

With a data set we mean a rectangular data frame consisting of rows (often items or respondents) and
columns (often measurements of or data about these items). These data sets can be R data.frames, but
tidyverse has its own version of data frames called tibble, which is functionally (almost) equivalent to a
data frame but is more efficient and somewhat easier to use.

As a very simply example, the following code creates a tibble containing respondents, their gender, and
their height:

data <- tibble (resp = c(1,2,3),
gender = c("M","F","F"),
height = c(176, 165, 172))
data



Tibble is more recent and powerful
- Dbl (double)
- Chr (character variable)




2

,Reading data
The example above manually created a data set, but in most cases you will start with data that you get
from elsewhere, such as a csv file (e.g. downloaded from an online dataset or exported from excel) or an
SPSS or Stata data file.

Tidyverse contains a function read_csv that allows you to read a csv file directly into a tibble. You specify
the location of the file, either on your local drive (as we did in the last practical session) or directly from the
Internet!

The example below downloads an overview of gun polls from the data analytics site 538, and reads it into a
tibble using the read_csv function:

url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/poll-quiz-guns/guns-polls.csv"
d <- read_csv(url)
d

(Note that you can safely ignore the (red) message, they simply tell you how each column was parsed)

The shows the first ten rows of the data set, and if the columns don’t fit they are not printed. The
remaining rows and columns are printed at the bottom. For each column the data type is also mentioned
(stands for integer, which is a numeric value; is textual or character data). If you want to browse through
your data, you can also click on the name of the data.frame (d) in the top-right window “Environment” tab
or call View(d).

Subsetting with filter()
The filter function can be used to select a subset of rows. In the guns data, the Question column specifies
which question was asked. We can select only those rows (polls) that asked whether the minimum
purchase age for guns should be raised to 21:

age21 <- filter(d, Question == 'age-21')
age21

Question == 'age-21') = an expression which is true of false

This call is typical for a tidyverse function: the first argument is the data to be used (d), and the remaining
argument(s) contain information on what should be done to the data.

Note the use of == for comparison: In R, = means assingment and == means equals. Other comparisons are
e.g. > (greather than), <= (less than or equal) and != (not equal). You can also combine multiple conditions
with logical (boolean) operators: & (and), | (or), and ! (not), and you can use parentheses like in
mathematics.

So, we can find all surveys where support for raising the gun age was at least 80%:

filter(d, Question == 'age-21' & Support >= 80) = 80 and larger

Note that this command did not assign the result to an object, so the result is only displayed on the screen
but not remembered. This can be a great way to quickly inspect your data, but if you want to continue
analysing this subset you need to assign it to an object as above.


3

, Selecting certain columns
Where filter selects specific rows, select allows you to select specific columns. Most simply, we can simply
name the columns that we want to retrieve them in that particular order.

###Select specific columuns
select(age21, Population, Support, Pollster)




You can also use some more versatile functions such as contains() or starts_with() within a select()
command:

select(age21, contains("Supp")) # Selects all variables that contain the stem "Supp" in their name

You can also specify a range of columns, for example all columns from Support to Democratic Support:

###Specify range of columns
select(age21, Support:`Democratic Support`)




Note the use of ‘backticks’ (reverse quotes) to specify the column name, as R does not normally allow
spaces in names.

Select can also be used to rename columns when selecting them, for example to get rid of the spaces:

###Rename columns to get rid of spaces (between republican support etc)
select(age21, Pollster, rep = `Republican Support`, dem = `Democratic Support`)




Note that select drops all columns not selected. If you only want to rename columns, you can use the
rename function:

### Only renaming columns
rename(age21, start_date = Start, end_date = End)




Finally, you can drop a variable by adding a minus sign in front of a name:

### Drop variable by adding - in front of a name
select(age21, -Question, -URL)


4

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Vustudentt. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 56326 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€5,49  3x  verkocht
  • (0)
In winkelwagen
Toegevoegd