~ ADVANCED DATA ANALYSIS ~
PRACTICAL 1
OVERVIEW ALL FUNCTIONS – PRACTICAL 1
Seq(from=X, to=X, by=X) a function that generates a sequence of numbers
Session – set working directory – choose
directory
Shift slash /
Alt gr tilde ~
Alt gr vierkant haakje []
Plot(x=X, y=Y) Generate a plot
Getwd()
List.files(getwd())
Read.table(“X”) DEFAULT
File=”XX”
header = FALSE If header present: =TRUE
sep = "" Columns separated by white space
If not à sep = "\t"
dec = "." the character used in the file for
decimal points
na.strings = "NA" a character vector of strings which are
to be interpreted as NA values.
dlist should character vectors be converted
to factors?
Needs to be =TRUE
read.table(file = " XXX ", header = TRUE, sep = "\t", dec = ",", na.strings
= "?")
as.numeric()
as.character()
as.logical()
as.factor()
C() To create a vector
concatenation operator, that wraps individual elements into a vector.
,class(myData) What kind of data structure would it be?
str(myData) The variables can be of different types. A more comprehensive
overview of the current data structure is given by the str() function
names(myData) the names of the variables are an entire part of the data frame. They
can be invoked using the names() function
class(names(myData)) What kind of data type would it be?
dim(myData) The dimensions of a table can be extracted using the dim function.
class(dim(myData)) What kind of data type would it be?
class(myData$exam) The individual variables of a data frame are also objects on their own,
and belong to a class.
length(myData$exam) The length of a vector
class(myData$gender)
levels(myData$gender) The levels of the factor can be extracted using the levels function
myData$workshop <- The class of the variable can easily be changed (“co-erced”) into a
as.factor(myData$workshop) factor
myData$workshop <- You can assign more descriptive names to the factor levels in the
factor(myData$workshop, levels = c(1,2,3) following way.
, labels = c("R","SAS","SPSS") )
summary(myData$workshop) A simple summary statistic, the frequencies of the levels of such a
factor, can be found
table(myData$workshop) A simple summary statistic, the frequencies of the levels of such a
factor, can be found
summary(as.numeric(myData$workshop)) Suppose you temporarily (eg. in 1 formula) want the workshop to be a
number again, you can use the co-ercion expression in the formula:
myData$ID <- as.character(myData$ID) the ID-variable has been interpreted as a factor with 12 levels. Hence,
internally the ID will be considered as a number. We’d rather have it as
a character vector.
myData$pass <- make a new binary variable telling if a person passed the exam, ie. got
ifelse(myData$exam>=10,TRUE,FALSE) at least 10/20.
myData$pass2 <- A logical variable can be coerced into a number: true=1, false=0
as.numeric(myData$pass)
Write.file First mandatory argument X= name of the table
Second argument = name of the resulting file
DEFAULT
write.table(x, file = "", append = FALSE, quote
= TRUE, sep = " ",eol = "\n", na = "NA", dec =
".", row.names = TRUE,col.names = TRUE, qmethod
= c("escape", "double"),fileEncoding = "")
, quote = TRUE if set to true, factors will be
surrounded by “” à don’t want that
so à quote = FALSE
sep = " " the field separator, we do’t want
white space, we want tap à sep
=”/t”
dec = "." but we want coma à dec = “,”
na = "NA" How do you want to call the missing
values?
row.names = TRUE Row.names = TRUE : a logical
(true/fals) that indicate whether the
row names of the data frame are to
be written along with x, this means
that the row names are exporter with
the rest of the table
month.name[1:3] If we want more than one element, you have to index with a vector
month.name[c(1,4,7)]
month.name[-2] All months except February
names(myData) (retrieves the column headers of a data frame, returning a character
vector)
names(myData)[3]<- "sex" By modifying the elements of this vector, one can change the variable
names
demo.matrix<- Make a matrix
matrix(1:12,nrow=3,byrow=T)
The byrow=T options is added since by default, a matrix is filled up by
columns.
myData[1,] Select first row
myData[1:5,] select row 1 to 5
myData[,-c(1,3)] Select everything except for column 1 and 3
myData[,1] Select the first column
select <- c(1,3) The indexes can also be stored as an object. Here we first create a
numeric vector object, with which the first and third column are
myData[,select] retrieved.
, myData[,c(1,3,4,2,8,9,5,6,7)] The indexing offers the possibility to switch the position of the columns
and rows
Ordering a dataset
order(myData$exam) - First select according to what you want to order
- Assign to object
o <- order(myData$exam) - Give data in orderd way
myData[o,]
oo<- Sorting by several criteria is done by supplying multiple arguments to
order(myData$sex,myData$workshop) the order function.
myData[oo,]
myData[myData$workshop == "SPSS" , ] The following command selects the subjects that have followed the
SPSS course
myData$workshop == "SPSS" It’s instructive to decompose this operation into smaller steps. The
statement between the square brackets creates a logical vector. (you
get TRUE or FALS for each row)
myData[myData$workshop == "SPSS" , Additional selection of columns when you have selected a specific
subset of rows
c(1,4)]
myData[myData$workshop == "SPSS" & Linking selection conditions
myData$sex == "female" , ]
myData[myData$pass == TRUE
& myData$sex == "female" , ]
myData[myData$pass == TRUE & Using the is.na() function, we add an additional condition that removes
records with a missing exam result.
myData$sex == "female" &
is.na(myData$exam)==FALSE ,]
myData[myData$pass == TRUE & Using the is.na() function, we add an additional condition that removes
records with a missing exam result.
myData$sex == "female" &
is.na(myData$exam)==FALSE ,]
select <- which(myData$exam>10) An alternative for selecting records is through the which() function,
that searches for records(rows) matching a certain condition. A