Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien
logo-home
Summary Practical sessions Computational Analysis of Digital Communication €5,49   Ajouter au panier

Resume

Summary Practical sessions Computational Analysis of Digital Communication

 42 vues  3 fois vendu
  • Cours
  • Établissement

Practical session summary of Computational Analysis of Digital Communication given at Vrije Universiteit Amsterdam. S_CADC

Aperçu 4 sur 70  pages

  • 29 novembre 2022
  • 70
  • 2022/2023
  • Resume
avatar-seller
Basics
<- is the same as =

Basic data types in R
Numeric – Numbers
Character – text
Factor – categorical data
Logical – true or false

Number can be expressed as character, but text cannot be expressed as numerical

Vector – sequence of one/more values of the same data type (= variable)
- Can have any type of data

Data frame is a collection of vectors with the same length, tied together as columns

A function has the form: output <- function_name(argument1, argument2, ...)
- function_name is a name to indicate which function you want to use. It is followed by parentheses.
- arguments are the input of the function, and are inserted within the parentheses. Arguments can
be any R object, such as numbers, strings, vectors and data.frames. Multiple arguments can be
given, separated by commas.
- output is anything that is returned by the function, such as vectors, data.frames or the results of a
statistical analysis. Some functions do not have output, but produce a visualization or write data to
disk.

The purpose of a function is to make it easy to perform a (large) set of (complex) operations. This is crucial,
because
- It makes code easier to understand. You don’t need to see the operations, just the name of the
function that performs them
- You don’t need to understand the operations, just how to use the function




1

,Week 1- Practical session 1
R Tidyverse - Data transformation & summarization


Introduction
The goal of this practical session is to get you acquainted with the Tidyverse and to learn how to transform
and summarize data. Tidyverse is a collection of packages that have been designed around a singular and
clearly defined set of principles about what data should look like and how we should work with it. It comes
with a nice introduction in the R for Data Science book, for which the digital version is available for free.
This tutorial deals with most of the material in chapter 5 of that book.

In this part of the tutorial, we’ll focus on working with data using the tidyverse package. This package
includes the dplyr (data-pliers) packages, which contains most of the tools we’re using below, but it also
contains functions for reading, analyzing and visualizing data that will be explained later.

Installing tidyverse
As before, install.packages() is used to download and install the package (you only need to do this once on
your computer) and library() is used to make the functions from this package available for use (required
each session that you use the package).
install.packages("tidyverse")
library(tidyverse)

Tidyverse basics
As in most packages, the functionality in dplyr is offered through functions. In general, a function can be
seen as a command or instruction to the computer to do something and (generally) return the result. In
the tidverse package dplyr, almost all functions primarily operate on data sets, for example for filtering and
sorting data.

With a data set we mean a rectangular data frame consisting of rows (often items or respondents) and
columns (often measurements of or data about these items). These data sets can be R data.frames, but
tidyverse has its own version of data frames called tibble, which is functionally (almost) equivalent to a
data frame but is more efficient and somewhat easier to use.

As a very simply example, the following code creates a tibble containing respondents, their gender, and
their height:

data <- tibble (resp = c(1,2,3),
gender = c("M","F","F"),
height = c(176, 165, 172))
data



Tibble is more recent and powerful
- Dbl (double)
- Chr (character variable)




2

,Reading data
The example above manually created a data set, but in most cases you will start with data that you get
from elsewhere, such as a csv file (e.g. downloaded from an online dataset or exported from excel) or an
SPSS or Stata data file.

Tidyverse contains a function read_csv that allows you to read a csv file directly into a tibble. You specify
the location of the file, either on your local drive (as we did in the last practical session) or directly from the
Internet!

The example below downloads an overview of gun polls from the data analytics site 538, and reads it into a
tibble using the read_csv function:

url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/poll-quiz-guns/guns-polls.csv"
d <- read_csv(url)
d

(Note that you can safely ignore the (red) message, they simply tell you how each column was parsed)

The shows the first ten rows of the data set, and if the columns don’t fit they are not printed. The
remaining rows and columns are printed at the bottom. For each column the data type is also mentioned
(stands for integer, which is a numeric value; is textual or character data). If you want to browse through
your data, you can also click on the name of the data.frame (d) in the top-right window “Environment” tab
or call View(d).

Subsetting with filter()
The filter function can be used to select a subset of rows. In the guns data, the Question column specifies
which question was asked. We can select only those rows (polls) that asked whether the minimum
purchase age for guns should be raised to 21:

age21 <- filter(d, Question == 'age-21')
age21

Question == 'age-21') = an expression which is true of false

This call is typical for a tidyverse function: the first argument is the data to be used (d), and the remaining
argument(s) contain information on what should be done to the data.

Note the use of == for comparison: In R, = means assingment and == means equals. Other comparisons are
e.g. > (greather than), <= (less than or equal) and != (not equal). You can also combine multiple conditions
with logical (boolean) operators: & (and), | (or), and ! (not), and you can use parentheses like in
mathematics.

So, we can find all surveys where support for raising the gun age was at least 80%:

filter(d, Question == 'age-21' & Support >= 80) = 80 and larger

Note that this command did not assign the result to an object, so the result is only displayed on the screen
but not remembered. This can be a great way to quickly inspect your data, but if you want to continue
analysing this subset you need to assign it to an object as above.


3

, Selecting certain columns
Where filter selects specific rows, select allows you to select specific columns. Most simply, we can simply
name the columns that we want to retrieve them in that particular order.

###Select specific columuns
select(age21, Population, Support, Pollster)




You can also use some more versatile functions such as contains() or starts_with() within a select()
command:

select(age21, contains("Supp")) # Selects all variables that contain the stem "Supp" in their name

You can also specify a range of columns, for example all columns from Support to Democratic Support:

###Specify range of columns
select(age21, Support:`Democratic Support`)




Note the use of ‘backticks’ (reverse quotes) to specify the column name, as R does not normally allow
spaces in names.

Select can also be used to rename columns when selecting them, for example to get rid of the spaces:

###Rename columns to get rid of spaces (between republican support etc)
select(age21, Pollster, rep = `Republican Support`, dem = `Democratic Support`)




Note that select drops all columns not selected. If you only want to rename columns, you can use the
rename function:

### Only renaming columns
rename(age21, start_date = Start, end_date = End)




Finally, you can drop a variable by adding a minus sign in front of a name:

### Drop variable by adding - in front of a name
select(age21, -Question, -URL)


4

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

Qualité garantie par les avis des clients

Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.

L’achat facile et rapide

L’achat facile et rapide

Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.

Focus sur l’essentiel

Focus sur l’essentiel

Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur Vustudentt. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour €5,49. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis)

67096 résumés ont été vendus ces 30 derniers jours

Fondée en 2010, la référence pour acheter des résumés depuis déjà 14 ans

Commencez à vendre!
€5,49  3x  vendu
  • (0)
  Ajouter