Programming in R
0: introduction
Why coding is good for us
Why code?
In order to be successful in coping with both societal and scientific challenges we
need to understand this technology and take advantage of its power.Computer
programming (or "coding") is a key building block of the computational thinking
skills we need. It provides us with a directly applicable set of skills to get things
done. But perhaps even more important: coding teaches us how to work
systematically and apply a stepwise approach to problem-solving
Course goal and structure
Prior knowledge
sqrt() —> no capital s
Other things to download and get started
• The README.md file contains a general explanation of the contents in this
project folder - it would be the starting point for anyone who starts to
explore its contents.
• The doc folder contains documents that you write (e.g. your course notes or
analysis reports generated by R-scripts), in this course we store
background reading material in this place as well (have a look at this
document: r-cheat-sheet-3.pdf).
• The data folder contains the raw data files that you will use during this course;
raw data should never be changed, so it is important to keep this folder "as
is" and not save any new data in it to avoid losing your original data.
• The script folder contains R-scripts that contain the code you will write over the
course. It is often useful to number the scripts so that you know in which
order they are supposed to be executed (for this course the numbers could
represent the chapter numbers). As an example, two scripts have been
created in this folder with a suitable naming.
• The functions sub-folder contains R-code that you would typically call from an
R-script in the parent-folder (script).
• The out folder contains any output that is created by R scripts, like figures (in
the figures sub-folder) and derived or intermediate data files (in the data
sub-folder). This is the folder you can use to store any data you will create
during this course.
1: Data types, objects and operators
Data types
Data types - basic
Basic types of data:
• character – letters, input written between quotes. Are generated by putting double
quotes around elements.
• double/Numeric– numbers with a whole and a fractional part (like 3.14) also called
floating point numbers or real numbers. Are generated by entering numbers
,• integer – whole numbers. Are generated by adding a capital L to a number (without
space)
• logical – Boolean values (true and false). Are generated by using the reserved
words TRUE and FALSE
• Execute commands —> variables in environment tab.
• The double type is called num
• We only use double quotes (“)
Data types - factors
Factor: used to store categorical data in such a way that is uses a minimum amount
of space, but keeps all relevant data.
This is achieved by using a combination of two vectors to represent an original vector
with categorical labels:1. A vector with integers to represent the data
2. A vector with character labels linked to each integer that can occur in the data.
Example:
oddeven_int <- c(0, 1, 0, 1, 0, 1, 0, 1)
oddeven_char <- c("o", "e", "o", "e", "o", "e", "o", "e")
To turn data into a factors: we need to provide two additional pieces of information:
• the categories that should be distinguished in the original object, with the input
argument levels = …
• The labels that should be used in the newly created factor, with the input argument
labels=…
Example:
(oe_factor <- factor(oddeven_int, levels=c(0,1), labels=c(“odd”,"even")))
(oe_factor2 <- factor(oddeven_char, levels=c("o","e"), labels=c("odd","even")))
Check and convert data types
Information about data type in environment tab and with command typeof()
Check whether an object is of a given type:
is.character(), is.double(), is.integer(), is.logical(), is.factor()
For example:
is.character(nr_text)
TRUE
is.character(nr_double)
FALSE
Is.numeric() checks whether the object is either of the type double/integer/not
Converting data type into another type/ between numbers and characters. The
command as. Is used
Example:
As.character(nr_double)
"1" "2" "3" "4" "5" "6" "7" “8"
But converting is not always possible. For example in the text has to consist of
numbers when converting into numbers. Otherwise there will be NA’s as outcome.
Logical data can be translated into ones (for TRUE) and zeros (for FALSE) or vice
versa.
as.factor: translate a numerical, logical or character data into a factor. This is a short
version of the factor() command with levels and labels
,Data types (exercise)
Combining vectors A and B that are already made —> new vector hat combines: C<-
c(A,B)
Vector with c(month.abb[1:3]… means that the levels=month.abb
Add NA to a vector : c(old vector, NA). Then for the vector NA in stead of levels=…
Data objects
Data objects - vectors
Simplest storage structure in R is the vector created by one value or a combination
c().
Vectors can only contain elements of the same type.
If you combine different types into a single vector, use comb_…_…
For example: comb_text_double <- c(“one”,”two”,4,5)
typeof(comb_text_double) gives “character” because it convert the incompatible
types to a common one.
Length of the vector —> length() or Environment pane
To get the output printed —> extra brackets () around the whole code, or print()
before the code. Or repeat the object name (before <-)
Data objects - arrays
Arrays: for situations where data has a logical ordering along more than one
dimension (long, lan f.e.).
Arrays can have data of a single type, also data with one dimension can be stored.
Array has an explicit orientation
Example:
nr_text <- c("one", "two", "three", "four", "5","6","7","8")
(nr_text_2x4 <- array(nr_text,dim=c(2,4))
## [,1] [,2] [,3] [,4]
## [1,] "one" "three" "5" "7"
## [2,] "two" "four" "6" "8"
dim=… expects a vector with integers<0 as input. The first value gives the number of
rows, the second of columns, the third value the number of layers etc.
Elements are being recycled when for example dim=c(4,3)
dim=c(2,2,2) f.e. is also possible
matrix() command can be used instead of array() and can only create one-or two-
dimensional arrays. nrow=… and ncol=… are used and the additional byrow=..
indicates if the values have to be organized row-wise or column-wise. Bij byrow=
FALSE tel je van boven naar onder, bij byrow=TRUE van links naar rechts.
nr_text <- c("one", "two", "three", "four", "5","6","7","8")
> (nr_text_2x4 <- matrix(nr_text, nrow=2))
[,1] [,2] [,3] [,4]
[1,] "one" "three" "5" "7"
[2,] "two" "four" "6" "8"
> (nr_text_2x4 <- matrix(nr_text, ncol=2))
[,1] [,2]
, [1,] "one" "5"
[2,] "two" "6"
[3,] "three" "7"
[4,] "four" "8"
> (nr_text_2x4 <- matrix(nr_text, nrow=2, byrow=TRUE))
[,1] [,2] [,3] [,4]
[1,] "one" "two" "three" "four"
[2,] "5" "6" "7" "8"
> (nr_text_2x4 <- matrix(nr_text, nrow=2, byrow=FALSE))
[,1] [,2] [,3] [,4]
[1,] "one" "three" "5" "7"
[2,] "two" "four" "6" "8"
> (nr_text_2x4 <- matrix(nr_text, ncol=2, byrow=TRUE))
[,1] [,2]
[1,] "one" "two"
[2,] "three" "four"
[3,] "5" "6"
[4,] "7" "8"
> (nr_text_2x4 <- matrix(nr_text, ncol=2, byrow=FALSE))
[,1] [,2]
[1,] "one" "5"
[2,] "two" "6"
[3,] "three" "7"
[4,] "four" "8"
See the dimensions from an array is dim(), not for vectors (because orientation).
For example:
dim(nr_text_2x4)
## [1] 2 4
length() works on any object.
dim() with vector —> NULL is result, which means the properties of an object are
missing or the object is absent. NA means missing value within an existing data
object
Data objects - lists
Lists: vector, more flexible because not each element has to be of the same type.
With the command list() different vectors/arrays can be combined
Example:
nr_text <- c("one", "two", "three", "four", “5","6","7","8")
nr_int <- c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L)
nr_double <- c(1, 2, 3, 4, 5, 6, 7, 8)
nr_text_2x4 <- array(nr_text, dim=c(2,4))
nr_int_2x4 <- array(nr_int, dim=c(2,4))
nr_double_2x4 <- array(nr_double, dim=c(2,4))
vec_list <- list(nr_text, nr_int)
array_list <- list(nr_text_2x4, nr_int_2x4, nr_double_2x4)
(nr_list2 <- list(vec_list, array_list))
## [[1]]