Overview – Statistical Software Basics in R
Table of Contents
Chapter 2 – Data Structures (Vectors, Factors, Matrices, Data Frames, and Lists)............................................2
2.1 Vectors............................................................................................................................................................2
2.2 Factors – factor()............................................................................................................................................3
2.3 Matrices – matrix(), rbind(), cbind()...............................................................................................................4
2.4 Data Frame.....................................................................................................................................................4
2.5 List – list().......................................................................................................................................................4
Chapter 3 – Importing and exporting data...................................................................................................... 5
3.1 Importing an Excel file....................................................................................................................................5
3.2 Exporting a data frame to an Xlsx file............................................................................................................5
3.3 Importing a Txt file.........................................................................................................................................5
3.4 Exporting a data frame to a Txt file...............................................................................................................5
Chapter 4 – Writing your own functions – function()......................................................................................5
Chapter 5 – Graphics with R........................................................................................................................... 6
5.1 Scatterplots – plot(), legend(), lines(), abline()...............................................................................................6
5.2 Histogram – hist(), density()...........................................................................................................................6
5.3 Boxplot – boxplot(), density(), outliers, fivenum()..........................................................................................7
Chapter 6 – Some concepts of the dplyr package............................................................................................ 7
6.1 Basic Functions: filter(), select(), arrange(), mutate(), transmute(), summarise().........................................7
6.2 The pipe to combine multiple operations.......................................................................................................8
6.3 Integration of multiple sources: inner_join(), outer_join(), full_join(), merge(), complete.cases()...............8
Chapter 7 – More programming in R.............................................................................................................. 9
7.1 The apply functions – apply(), lapply(), tapply(), aggregate(), split()............................................................9
7.2 Loops in R – for(), while()................................................................................................................................9
7.3 Dates – as.numeric(), make_date(), year(), month(), mday(), wday()...........................................................9
7.4 Spreading and gathering tables – pivot_longer(), pivot_wider().................................................................10
Chapter 8 – Statistical inference for continuous data....................................................................................10
8.1 One sample – get_summary_stats()............................................................................................................10
8.2 One sample t-test – t.test()..........................................................................................................................10
8.3 non-parametric alternative – wilcox.test()..................................................................................................10
8.4 Two samples.................................................................................................................................................10
8.4.1 Testing normality in both samples – Shapiro.test().............................................................................10
8.4.2 Testing equality of variances in both samples – var.test()...................................................................10
8.4.3 Testing equality of means in both samples – t.test(), wilcox.test().....................................................11
8.5 Correlation analysis – cor_test(), cor_mat()................................................................................................11
, Chapter 9 – Statistical inference for discrete data.........................................................................................11
9.1 Testing independence – chisq.test().............................................................................................................11
9.2 Summary data – chisq.test(), fisher.test()....................................................................................................11
9.3 Other functions for count data – prop.test(), Binom.test()..........................................................................11
Chapter 10 – An example of a regression analysis......................................................................................... 12
10.1 Regression analysis with usual R – lm(), summary(), curve(), identify()....................................................12
Chapter 11 – Grammar of ggplot2................................................................................................................ 12
11.1 First layer and second layer: data and mapping layer – ggplot(), theme_bw()........................................12
11.2 Third layer: geometric – geom_XXX, scale_color_manual()......................................................................12
11.3 Fourth layer: statistic – stat_YYY...............................................................................................................13
11.4 Fifth layer: facet – facet_grid(), facet_wrap(), cut_interval()....................................................................13
To create a grid, showing the labels at the margins of the plot:.......................................................................13
11.5 Multiple plots on the same page – pushViewport(), grid.newpage(), grid.layout()..................................13
11.6 Adding statistical summaries – stat_summary().......................................................................................13
11.7 Animated Graphs – transition_states(), enter_fade(), exit_shrink()..........................................................14
Chapter 2 – Data Structures (Vectors, Factors, Matrices, Data
Frames, and Lists)
2.1 Vectors
Creating a vector with x elements and a normal distribution:
Vector <- rnorm(x)
To create a sample with x random elements between two values e.g. -5 and 5:
If replace=FALSE, then one integer cannot occur more than once.
Sample <- sample(-5:5, size=x, replace=FALSE)
To create a sample with x random elements between two values e.g. -5 and 5, which are
rounded to 8 decimals or 2 decimals:
For 8 decimals (default): runif(n = x, min = -5, max = 5)
For 2 decimals: round(runif(n=x, min = -5, max = 5), digits=2)
Operations that can be done on a vector:
Length(x) -> Retrieve N° of elements Sum(x,y)
Sum(x) -> Take the sum of all elements Prod(x,y)
Prod(x) -> Take the product of all elements Max(x,y)
Max(x) -> Retrieve maximum value Min(x,y)
Diff(x, lag=2) -> example1
Unique(x) -> Extract all unique values
Rev(x) -> Return in reversed version
Seq(from= , to= , by= ) -> generate sequence
Rep(x, n times) -> generate sequence
1
Vector <- c(2,3,4,6,20) diff(Vector, lag=2) returns (2, 3, 16)