dim(): dimension(nr of row+col)
duplicated(): duplicates?
LU1 – Basics I (p. 3)
unique(): remove duplicates
Vectors (same data type) any(is.na()): NA?
.c(1,2,3,4,5) c(1:10) complete.cases(): remove NA: df[complete.cases(df), ]
vector[2] | vector[“July”] → find element in vector …(…, na.rm = TRUE): ignore NA
obtain row 7, col 2, list 3: dataset[[3]][7, 2] dataset[is.na(x)] <- 0 to replace all NA values with 0
Matrix (same data type) Modifying or creating variables
matrix (c(1,3,5,7,8,11), nrow = 2 ,ncol = 3 ,byrow = TRUE) rbind(,) & cbind (,): add two datasets to one
byrow = TRUE/FALSE → filled in by row or column merge(df1, df2, all.x = TRUE): add two datasets to one
matrix[row,column] → find element in matrix
colnames & rownames(matrix) <- c(“June”, “July”)
LU4 – Data management II (p. 61)
Functions Logical operations
length(): length of object < less than
head(): info first 6 components <= less or equal than
tail(): info last 6 components > more than
c(): making vector >= more or equal than
sum(): calc. sum == equal
mean(): calc. average != not equal
sd(): calc. SD !x not x
max(): view max number, range() tells max + min x|y or
min(): view min number x&y and
str(): tells structure Subset & omit
sqrt(): square root subset(): create subdata
round(, digits = ): rounding a number subset(df,month == “July”)
objects() & ls(): what objects in environment subset(esoph, ncases == 17, select = c(-ncontrols))
rm(): remove object in environment subset(esoph, !ncases >= 1) # Only observations with ncases less than 1
rm(list = ls()): removing all objects in enviroment which(): create subdata and if TRUE/FALSE
letter() or LETTERS(): sequence letters #change the value in the column flying to yes for Chiroptera species
IUCN_mammals$flying[which(IUCN_mammals$order_name == "CHIROPTERA")]
<- "Yes" #Vervangen van een waarde in een reeks
LU2 – Basics II (p. 21)
esoph[-c(20:88), ] # Omit rows 20 - 88
Datatype
Logical: TRUE/FALSE
Numeric: any number Generative sequences
Complex: number and text seq(from = ,to = ,by = , length.out =): generate sequence (use - decrease)
Character: text length.out = amount numbers
class(): Identifying structure rep(, each =): replicate or do again
rep(c(1:3), times = 3) # Replicate the vector sequence 3 times
Dataframe (all data type)
data.frame(c(1,3,5,7,8,11), nrow = 2 ,ncol = 3 ,byrow = TRUE, stringAsFactors = Sorting and ordering
FALSE) sort(esoph$ncases,decreasing = TRUE/FALSE): ordering
stringAsFactors = TRUE/FALSE → FALSE is not in characters order(column, decreasing = TRUE/FALSE ): ordering whole dataframe
df$color: $ selecting column esoph[order(esoph$ncases, decreasing = TRUE), ]
as.data.frame(): matrix to dataframe
Conditional statements Next = jump to next iteration
Factor (categorical data) traffic_light <- "yellow" of loop
for (i in c(1:10)) {
factor(c(“right”,“left”) time_to_stop <- "no
if (I == 6) {
levels(dataset): check the factor levels print (“Skip variable 6
if (traffic_light == "red") { next
Array (2D+ and same data type) print("You need to stop") }
array(c(height1, height2, weight1, weight2), dim=c(5, 2, 2) } else if (traffic_light == "yellow") {
print("Watch out the light is yellow...")
List (2D+ and all data type) } else {
list(vector = shortv, matrix = shortm, factor = shortf) print(“Equal to 2”)
as.list(): vector to list }
unlist(): list to vector
if (time_to_stop == "yes") {
print("and you need to stop")
LU3 – Data management I (p. 39) } else {
Working directory print("but you can go")
getwd(“C:/”): location wd }
setwd("C:/Users/celin/Documents"): set wd } else if (traffic_light == "green") { print("You
list.files(): overview files can go") }
Create and load data
write.csv(name, file = “Derived_data/test_data.csv”, row.names = FALSE, sep =
“,”, header = TRUE/FALSE) #if 1st row contains column names
row.names = TRUE/FALSE → not include row names
.txt → table, .rds →RSD,
read.csv(name, file = “Derived_data/test_data.csv”, row.names = FALSE)
file.exists(): does file exist
data(): loading existing built-in data
Inspecting data
View(): data in viewer
rownames() & colnames(): names of row or col
nrow() & ncol(): amount or rows or cols