LU1: vectors, matrix()
- my_first_matrix <- matrix(c(1, 3, 5, 7, 9, 11), nrow = 2, ncol = 3, byrow = TRUE)
- round(sqrt(42), digits = 1)
LU2: class(), is.logical(), as.numeric(), data.frame(), array(), list()
Checking data type: class(my_object) Converting data type: my_numeric_object <- as.numeric(my_object)
- datfr <- data.frame(x = c(1:3), y = c(“A”, “B”, “C”))
- arr <- array(c(height1, height2, weight1, weight2), dim = c(5, 2, 2))
- somelist <- list(vector = shortv, matrix = shortm, factor = shortf)
LU3: rm(), save(), load(), getwd(), setwd(), rownames(), duplicated(), unique(), any(is.na()), complete.cases(), rbind(),
merge(
Removing objects from workspace or whole workspace: rm(second_object) rm(list = ls())
Check working directory: getwd() Set working directory: setwd(“U:/my_WD”)
- Save objects: save(first_object, second_object, third_object, file = “multiple_objects.RData”)
- Write csv after creating df: write.csv(test_data, file = “Derived_data/test_data.csv”, row.names = FALSE)
- Read csv in R: COVID_data <- read.csv(“Raw_data/COVID-19_casus.csv”, header = TRUE, sep = “,”,
stringsAsFactors = FALSE)
Change rownames: rownames(iris) <- paste(“flower”, rownames(iris), sep = “_”)
Rename all columns: colnames(iris) <- c(“Sepal Length [cm]”, “Petal Length [cm]”)
- Check for duplicates in data set: duplicated(iris) Remove duplicates from data: unique_iris <- unique(iris)
- Check for missing values (NA): any(is.na(iris)) Remove NA: complete_iris <- iris[complete.cases(iris), ]
LU4: subset(), which(), seq(), rep(), sort(), order(), if else statements
Select columns/rows: select row 1 and 3 from column 5 esoph[c(“1”, “3”), “ncontrols”]
Subset: subset(data set, condition(s), select (optional)):
Subset esoph highest tobgp AND equal to 0 cases esoph[esoph$tobqp == “30+” & esoph$ncases == 0, ]
subset(esoph, ncases == 17, select = c(agegp, ncases))
Get rid of NA values while subsetting newDat[which(newDat$y > 6, ]
Omitting data: omit row 20 – 88 esoph[-c(20:88), ]
- subset(esoph, ncases == 17, select = -ncontrols)
- Retain function: ncases less than 1 are retained subset(esoph, !ncases >= 1)
Sequences: subset esoph so only first four rows and 1, 3, 5 column esoph[seq(from = 1, to = 4, by = 1), seq(from = 1, to =
5, by = 2)]
Sorting/ordering data sets from low to high:
- Increasing: sort(esoph$tobgp) Decreasing sort(esoph$ncases, decreasing = TRUE)
- Ranking vectors/data frames esoph[order(esoph$ncases, decreasing = TRUE), ]
if (esoph$ncontrols[6] > esoph$ncontrols[38]){
print ("Observation 6 has more controls than 38")
} else if (esoph$ncontrols[6] < esoph$ncontrols[38]){
print ("Observation 6 has fewer controls than 38")
} else{
print ("Observation 6 has the same number of controls than 38")
}
LU5: summary(), min(), max(), mean(), median(), quantile(), colMeans(), rowMeans(), colSums(), rowSums(), table(),
aggregate(), hist(), plot(density()), qqnorm() qqline(), boxplot(), length()
Before doing summary statistics (summary()) unique() and any(is.na()) to see if there are still NA or missing values
- quantile(InsectSprays$count, probs = c(0.25, 0.75))