Tutorial 2: Introduction to Reading Data
Lecture and Tutorial Learning Goals:
After completing this week's lecture and tutorial work, you will be able to:
define the following:
absolute file path
relative file path
url
read data into R using a relative path and a url
compare and contrast the following functions:
read_csv
read_tsv
read_csv2
read_delim
read_excel
match the following tidyverse read_* function arguments to their descriptions:
file
delim
col_names
skip
choose the appropriate tidyverse read_* function and function arguments to load a given plain text tabular data set into R
use readxl library's read_excel function and arguments to load a sheet from an excel file into R
connect to a database using the DBI library's dbConnect function
list the tables in a database using the DBI library's dbListTables function
create a reference to a database table that is queriable using the tbl from the dbplyr library
retrieve data from a database query and bring it into R using the collect function from the dbplyr library
use write_csv to save a data frame to a csv file
optional: scrape data from the web
read/scrape data from an internet URL using the rvest html_nodes and html_text functions
compare downloading tabular data from a plain text file (e.g. *.csv ) from the web versus scraping data from a .html file
Any place you see ... , you must fill in the function, variable, or data to complete the code. Replace fail() with your completed code and run the
cell!
In [ ]:
### Run this cell before continuing.
library(tidyverse)
library(repr)
library(rvest)
library(stringr)
options(repr.matrix.max.rows = 6)
source("tests.R")
source("cleanup.R")
1. Happiness Report
As you might remember from worksheet_02 , we practised loading data from the Sustainable Development Solutions Network's World Happiness
Report (http://worldhappiness.report/). That data was the output of their analysis that calculated each country's happiness score and how much each
variable contributed to it. In this tutorial, we are going to look at the data at an earlier stage of the study - the aggregated/averaged values (per country and
year) for many different social and health aspects that the researchers anticipated might contribute to happiness (Table2.1 from this Excel spreadsheet
(https://s3.amazonaws.com/happiness-report/2018/WHR2018Chapter2OnlineData.xls)).
The goal for today is to produce a plot of 2017's positive affect scores against healthy life expectancy at birth, with healthy life expectancy at birth on the x-
axis and positive affect on the y-axis. For this study, positive affect was defined as the average of three positive affect measures: happiness, laughter and
enjoyment. We would also like to convert the positive affect score from a scale of 0 - 1 to a scale from 0 - 10.
1. use filter to subset the rows where the year is equal to 2017
2. use mutate to convert the "Positive affect" score from a scale of 0 - 1 to a scale from 0 - 10
3. use select to choose the "Healthy life expectancy at birth" column and the scaled "Positive affect" column
4. use ggplot to create our plot of "Healthy life expectancy at birth" (x - axis) and scaled "Positive affect" (y - axis)
Tips for success: Try going through all of the steps on your own, but don't forget to discuss with others (classmates, TAs, or an instructor) if you get
stuck. If something is wrong and you can't spot the issue, be sure to read the error message carefully. Since there are a lot of steps involved in working
with data and modifying it, feel free to look back at worksheet_02 .
, Question 1.1 Multiple Choice:
{points: 1}
What is the maximum value for the "Positive affect" score (in the original data file that you read into R)?
A. 100
B. 10
C. 1
D. 0.1
E. 5
Assign your answer to an object called answer1.1 . Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. "F" ).
In [ ]:
# Replace the fail() with your answer.
### BEGIN SOLUTION
answer1.1 <- "C"
### END SOLUTION
In [ ]:
test_1.1()
Question 1.2 Multiple Choice:
{points: 1}
Which column's values will be used to filter the data?
A. countries
B. generosity
C. positive affect
D. year
Assign your answer to an object called answer1.2 . Make sure your answer is an uppercase letter and is surrounded by quotation marks (e.g. "F" ).
In [ ]:
# Replace the fail() with your answer.
### BEGIN SOLUTION
answer1.2 <- "D"
### END SOLUTION
In [ ]:
test_1.2()
Question 1.3.0
{points: 1}
Use the appropriate read_* function to read in the WHR2018Chapter2OnlineData (look in the tutorial_02 directory to ensure you use the
correct relative path to read it in).
_Assign the data frame to an object called happy_df_csv ._
In [ ]:
### BEGIN SOLUTION
happy_df_csv <- read_csv(file = "data/WHR2018Chapter2OnlineData.csv")
### END SOLUTION
happy_df_csv
In [ ]:
test_1.3.0()
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller travissmith1. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.49. You're not tied to anything after your purchase.