Programming for economists – R courses
Introduction to R
Introduction & vectors
class() : checking the data type of a variable
Vector: 1d array; can hold numeric, character or logical values. The
elements all have the same data type
numeric_vector <- c(1,2,3)
charachter_vector <- c(“a”, “b”, “c”)
boolean_vector <- c(TRUE, FALSE, TRUE)
e.g. poker_vector <- c(140, -50, 20)
names(poker_vector) <- c(“Monday”, “Tuesday”, “Wednesday”)
or: days_vector <- c(“Monday”, “Tuesday”, “Wednesday”)
names(poker_vector) <- days_vector
Define a new variable based on a selection:
poker_midweek <- poker_vector[c(2,3,4)]
selection_vector <- poker_vector > 0
poker_winning_days <- poker_vector[selection_vector]
Matrices
Matrix: 2d array; can hold numeric, character or logical values. The
elements all have the same data type
Construct a matrix with 3 rows that contain numbers 1 up to 9:
matrix(1:9, byrow = TRUE, nrow = 3)
colnames(matrix) <- ...
rownames(matrix) <- ...
Add a column: new_matrix <- cbind(matrix, matrix_b)
Combine matrices: all_matrix <- rbind(matrix1, matrix2)
colSums(matrix)
rowSums(matrix)
Select rows and columns
matrix[1:3, 2] # selects rows 1, 2 and 3 and second column
matrix[,1] # selects all elements of the first column
Factors
Statistical data type used to store categorical variables
A categorical variable can belong to only a limited number of
categories, but it can correspond to an infinite number of values
,Step 1: created_vector <- c(“Male”, “Female”, “Female”, “Male”)
Step 2: factor_created_vector <- factor(created_vector)
factor(vector, order = TRUE, levels = c(“Low”, “Medium”, “High”))
# R assigns the factor levels in alphabetical order
summary()
factor2 <- factor_created_vector[2]
Dataframes
Dataframes: 2d object; can hold numeric, character or logical values.
Within a column all elements have the same data type, but different
columns can be of different data type
head()
tail()
str()
data.frame(name, row.names = NULL, check.rows = FALSE, check.names =
TRUE, fix.empty.names = TRUE)
df[1,3] # selects first row and third column
df[4,] # selects entire fourth row
df[1:5, “column_name”] # select first 5 values of column_name
or: df$column_name
subset(my_df, subset = some_condition)
order() # gives the ranked position of each element
e.g. a <- c(100, 10, 1000)
order(a) [1] 2 1 3
a[order(a)] [1] 10 100 1000
Example: planets_df => order diameter low-high
position <- order(planets_df$diameter)
planets_df[position, ]
Lists
List: different items in the list differ in length, characteristic and
type. Gathers a variety of objects under one name in an ordered way
(matrices, vectors, dataframes, etc.)
my_list <- list(comp1, comp2, ...)
my_list <- list(vec = my_vector, mat=my_matrix, df = my_df)
head(my_list)
, Introduction to the Tidyverse
Data wrangling, visualisation and grouping
Loading packages: library(dataset)
filter() : subset observatiFons
e.g. dataset %>%
filter(year == 2007, country == “Germany”)
arrange() : sorts a table based on a variable
e.g. dataset %>%
arrange(column_name) # in descending order: arrange(desc(column))
mutate() : mutate changes or add variables
e.g. dataset %>%
(change) mutate(pop = pop / 1000000) %>%
(add) mutate(gdp = gdpPercap * pop)
summarize() : turns many rows into one
e.g. dataset %>%
filter(year == 2007) %>%
summarize(meanLifeExp = mean(lifeExp), totalPop = sum(pop))
other functions for summarizing: median, min, max
group_by() : before summarize() turns groups into one row each
e.g. dataset %>%
group_by(year, continent) %>%
summarize(meanLifeExp = mean(lifeExp), totalPop = sum(pop))
Visualizing with ggplot2
library(ggplot2)
ggplot(dataset, aes(x = , y =, color = , size = ))
+ geom_point() + scale_x_log10()+ facet_wrap(~ sort) +
expand_limits(y=0)
More types of plots:
geom_line()
geom_col() # bar plot
geom_histogram(binwidth = ) # you only have to specify x =
geom_boxplot()
ggtitle(“”)