This summary includes practicums 1 and 2 (take home-assignment). The summary has the assignments with solutions and associated codes as well as a summary of the ppt about the practicals. I achieved 10/10 using this summary during the interim open-book exam.
Data mining/ advanced data analysis (2052FBDBMW)
All documents for this subject (3)
Seller
Follow
paulienmeulemeester
Content preview
~ ADVANCED DATA ANALYSIS ~
PRACTICAL 1
OVERVIEW ALL FUNCTIONS – PRACTICAL 1
Seq(from=X, to=X, by=X) a function that generates a sequence of numbers
Session – set working directory – choose
directory
Shift slash /
Alt gr tilde ~
Alt gr vierkant haakje []
Plot(x=X, y=Y) Generate a plot
Getwd()
List.files(getwd())
Read.table(“X”) DEFAULT
File=”XX”
header = FALSE If header present: =TRUE
sep = "" Columns separated by white space
If not à sep = "\t"
dec = "." the character used in the file for
decimal points
na.strings = "NA" a character vector of strings which are
to be interpreted as NA values.
dlist should character vectors be converted
to factors?
Needs to be =TRUE
concatenation operator, that wraps individual elements into a vector.
,class(myData) What kind of data structure would it be?
str(myData) The variables can be of different types. A more comprehensive
overview of the current data structure is given by the str() function
names(myData) the names of the variables are an entire part of the data frame. They
can be invoked using the names() function
class(names(myData)) What kind of data type would it be?
dim(myData) The dimensions of a table can be extracted using the dim function.
class(dim(myData)) What kind of data type would it be?
class(myData$exam) The individual variables of a data frame are also objects on their own,
and belong to a class.
length(myData$exam) The length of a vector
class(myData$gender)
levels(myData$gender) The levels of the factor can be extracted using the levels function
myData$workshop <- The class of the variable can easily be changed (“co-erced”) into a
as.factor(myData$workshop) factor
myData$workshop <- You can assign more descriptive names to the factor levels in the
factor(myData$workshop, levels = c(1,2,3) following way.
, labels = c("R","SAS","SPSS") )
summary(myData$workshop) A simple summary statistic, the frequencies of the levels of such a
factor, can be found
table(myData$workshop) A simple summary statistic, the frequencies of the levels of such a
factor, can be found
summary(as.numeric(myData$workshop)) Suppose you temporarily (eg. in 1 formula) want the workshop to be a
number again, you can use the co-ercion expression in the formula:
myData$ID <- as.character(myData$ID) the ID-variable has been interpreted as a factor with 12 levels. Hence,
internally the ID will be considered as a number. We’d rather have it as
a character vector.
myData$pass <- make a new binary variable telling if a person passed the exam, ie. got
ifelse(myData$exam>=10,TRUE,FALSE) at least 10/20.
myData$pass2 <- A logical variable can be coerced into a number: true=1, false=0
as.numeric(myData$pass)
Write.file First mandatory argument X= name of the table
, quote = TRUE if set to true, factors will be
surrounded by “” à don’t want that
so à quote = FALSE
sep = " " the field separator, we do’t want
white space, we want tap à sep
=”/t”
dec = "." but we want coma à dec = “,”
na = "NA" How do you want to call the missing
values?
row.names = TRUE Row.names = TRUE : a logical
(true/fals) that indicate whether the
row names of the data frame are to
be written along with x, this means
that the row names are exporter with
the rest of the table
month.name[1:3] If we want more than one element, you have to index with a vector
month.name[c(1,4,7)]
month.name[-2] All months except February
names(myData) (retrieves the column headers of a data frame, returning a character
vector)
names(myData)[3]<- "sex" By modifying the elements of this vector, one can change the variable
names
demo.matrix<- Make a matrix
matrix(1:12,nrow=3,byrow=T)
The byrow=T options is added since by default, a matrix is filled up by
columns.
myData[1,] Select first row
myData[1:5,] select row 1 to 5
myData[,-c(1,3)] Select everything except for column 1 and 3
myData[,1] Select the first column
select <- c(1,3) The indexes can also be stored as an object. Here we first create a
numeric vector object, with which the first and third column are
myData[,select] retrieved.
, myData[,c(1,3,4,2,8,9,5,6,7)] The indexing offers the possibility to switch the position of the columns
and rows
Ordering a dataset
order(myData$exam) - First select according to what you want to order
- Assign to object
o <- order(myData$exam) - Give data in orderd way
myData[o,]
oo<- Sorting by several criteria is done by supplying multiple arguments to
order(myData$sex,myData$workshop) the order function.
myData[oo,]
myData[myData$workshop == "SPSS" , ] The following command selects the subjects that have followed the
SPSS course
myData$workshop == "SPSS" It’s instructive to decompose this operation into smaller steps. The
statement between the square brackets creates a logical vector. (you
get TRUE or FALS for each row)
myData[myData$workshop == "SPSS" , Additional selection of columns when you have selected a specific
subset of rows
c(1,4)]
myData[myData$pass == TRUE & Using the is.na() function, we add an additional condition that removes
records with a missing exam result.
myData$sex == "female" &
is.na(myData$exam)==FALSE ,]
myData[myData$pass == TRUE & Using the is.na() function, we add an additional condition that removes
records with a missing exam result.
myData$sex == "female" &
is.na(myData$exam)==FALSE ,]
select <- which(myData$exam>10) An alternative for selecting records is through the which() function,
that searches for records(rows) matching a certain condition. A
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller paulienmeulemeester. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.11. You're not tied to anything after your purchase.