Complete solution (and assignment) of practicum 3 to 5 in English. Also a summary of the accompanying ppt. Includes codes and screenshots of the solutions in R. Everything is very quick and easy to find and clearly divided into the various practicals.
This is the part that comes to the final exam...
Data mining/ advanced data analysis (2052FBDBMW)
All documents for this subject (3)
Seller
Follow
paulienmeulemeester
Content preview
PRACTICAL 3
SUMMERY SLIDES
Automation, add-on packages and reshaping
AUTOMATION OF REPETITIVE ANALYSES
Dataset
• Genetic analysis of age-related hearing impairment
• Phenotype : Z-score
- Standardized measure for hearing quality
- Lower Z-score = better
• Association between Zscore and genotype
- SNP genotype : aa, ab, bb
• ANOVA : SNP GT = categorical
• Regression : SNP GT = 0,1,2
• Find SNP associated with phenotype
2 ways to analyse this
1) consider genotype categorical variable and do one way anova
2) Consider genotype to be a nr (count nr of rare alleles) (bb has 2 rare alleles)
Repetitive analysis
• Regression :
- one dependent numeric variable (Y)
- several independent (X) variables
• Regress Y on all separate X-variables
- à several times simple linear regression/ANOVA
1
,Y Xvar1 Xvar2 Xvar3 Xvar4
5.71 aa ab aa ab
-0.93 aa ab ab aa
2.58 ab bb bb aa
In columns the genotype
In the regression, the Y, the numeric phenotype = dependent variable (outcome)
We want to know p, which associations are significant
Don’t run all individually (anova) à smarter ways to do it trough R
R is useful for repetitive analysis
For the first Xvariable
myModel<- lm(Y ~ Xvar1)
For the second Xvariable
myModel<- lm(Y ~ Xvar2)
For the third X variable
myModel<- lm(Y ~ Xvar3)
• 1 analysis:
• myModel<- lm(Y ~ allXvars[,1])
For the first X variable
myModel<- lm(Y ~ allXvars[,1])
For the second X variable
myModel<- lm(Y ~ allXvars[,2])
For the third X variable
myModel<- lm(Y ~ allXvars[,3])
Could also run
i<-1
myModel<- lm(Y ~ allXvars[ ,i])
i<-2
myModel<- lm(Y ~ allXvars[ ,i])
i<-3
2
, myModel<- lm(Y ~ allXvars[ ,i])
Than have everytime the same formula, but you change i
Can ask R that i goes trough all the values
For-loop
• Let i run through all values from i to n, for all commands within the {curly braces}
for(i in 1:3) {
myModel<- lm(Y ~ allXvars[ ,i])
}
In curly brackets put value that i needs to run trough
Will be executed for i= 1, i=2, i=3
Parameter estimates in a loop
• Suppose in each step you estimate a parameter or a p-value
• First create empty vector to save the output from each step
p.value<-rep(NA,3)
for(i in 1:3) {
myModel<-lm(Y ~ allXvars[ ,i])
p.value[i]<-anova(myModel)[1,5]
We need p-values
Have to extract p value from that model, but we don’t want to overwrite the p-value
Have to extract and store it somewhere
First create an empty vector
Rep=repeat
Before start loop, create empty vector with 3 spaces
In the first loop you assign the result of the p value to the first place of the empty vector
When R has finished the loop, the vector has been filled and contains all p values of all loops
Automation with new function
• Piece of script, to be carried out multiple times
Data<- read.table(“input_1.txt”)
p.value<-t.test(…)
write.table(p.value,file=“output_1.txt”)
• Initialize the new function
- Select code and run
- Each time you restart R
- No output
• Run
doMyAnalysis(“input_1.txt”,”output_1.txt”)
doMyAnalysis(“input_2.txt”,”output_2.txt”)
doMyAnalysis(“input_3.txt”,”output_3.txt”)
Further automation
• Using a list object
- Consists of other objects (elements)
- Individual elements accessed by double square brackets
list.object[[i]]
• Here
- Put input files and/or output files in a list
- 1 list-objects, containing the 3 input-dataframes
Combine list and for-loop
• First create empty list
Mylist<-vector(“list”,n.elements)
• Read in the 3 inputfiles
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller paulienmeulemeester. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.18. You're not tied to anything after your purchase.