100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Summary Solution practical 3 to 5 - data mining $7.18   Add to cart

Summary

Summary Solution practical 3 to 5 - data mining

 5 views  0 purchase
  • Course
  • Institution

Complete solution (and assignment) of practicum 3 to 5 in English. Also a summary of the accompanying ppt. Includes codes and screenshots of the solutions in R. Everything is very quick and easy to find and clearly divided into the various practicals. This is the part that comes to the final exam...

[Show more]

Preview 4 out of 117  pages

  • September 13, 2024
  • 117
  • 2024/2025
  • Summary
avatar-seller
PRACTICAL 3
SUMMERY SLIDES

Automation, add-on packages and reshaping

AUTOMATION OF REPETITIVE ANALYSES

Dataset

• Genetic analysis of age-related hearing impairment

• Phenotype : Z-score

- Standardized measure for hearing quality

- Lower Z-score = better

• Association between Zscore and genotype

- SNP genotype : aa, ab, bb

• ANOVA : SNP GT = categorical

• Regression : SNP GT = 0,1,2

• Find SNP associated with phenotype



2 ways to analyse this

1) consider genotype categorical variable and do one way anova

2) Consider genotype to be a nr (count nr of rare alleles) (bb has 2 rare alleles)



Repetitive analysis

• Regression :

- one dependent numeric variable (Y)

- several independent (X) variables

• Regress Y on all separate X-variables

- à several times simple linear regression/ANOVA




1

,Y Xvar1 Xvar2 Xvar3 Xvar4
5.71 aa ab aa ab
-0.93 aa ab ab aa
2.58 ab bb bb aa


In columns the genotype

In the regression, the Y, the numeric phenotype = dependent variable (outcome)

We want to know p, which associations are significant

Don’t run all individually (anova) à smarter ways to do it trough R

R is useful for repetitive analysis




For the first Xvariable
myModel<- lm(Y ~ Xvar1)

For the second Xvariable
myModel<- lm(Y ~ Xvar2)

For the third X variable
myModel<- lm(Y ~ Xvar3)




• 1 analysis:

• myModel<- lm(Y ~ allXvars[,1])


For the first X variable
myModel<- lm(Y ~ allXvars[,1])

For the second X variable
myModel<- lm(Y ~ allXvars[,2])

For the third X variable
myModel<- lm(Y ~ allXvars[,3])



Could also run
i<-1
myModel<- lm(Y ~ allXvars[ ,i])

i<-2
myModel<- lm(Y ~ allXvars[ ,i])

i<-3

2

, myModel<- lm(Y ~ allXvars[ ,i])

Than have everytime the same formula, but you change i
Can ask R that i goes trough all the values

For-loop

• Let i run through all values from i to n, for all commands within the {curly braces}

for(i in 1:3) {

myModel<- lm(Y ~ allXvars[ ,i])

}



In curly brackets put value that i needs to run trough

Will be executed for i= 1, i=2, i=3



Parameter estimates in a loop


• Suppose in each step you estimate a parameter or a p-value
• First create empty vector to save the output from each step
p.value<-rep(NA,3)
for(i in 1:3) {
myModel<-lm(Y ~ allXvars[ ,i])
p.value[i]<-anova(myModel)[1,5]


We need p-values

Have to extract p value from that model, but we don’t want to overwrite the p-value

Have to extract and store it somewhere

First create an empty vector

Rep=repeat

Before start loop, create empty vector with 3 spaces

In the first loop you assign the result of the p value to the first place of the empty vector

When R has finished the loop, the vector has been filled and contains all p values of all loops



Automation with new function
• Piece of script, to be carried out multiple times
Data<- read.table(“input_1.txt”)
p.value<-t.test(…)
write.table(p.value,file=“output_1.txt”)

• 3 similar input files

3

, Data<- read.table(“input_1.txt”)
p.value<-t.test(…)
write.table(p.value,file=“output_1.txt”)
Data<- read.table(“input_2.txt”)
p.value<-t.test(…)
write.table(p.value,file=“output_2.txt”)
Data<- read.table(“input_3.txt”)
p.value<-t.test(…)
write.table(p.value,file=“output_3.txt”)




• Wrap the piece of code into a new function
- Give name to function
- List necessary arguments
- Use argument names in the code

doMyAnalysis<- function(inputfile,outputfile) {
Data<- read.table(inputfile)
p.value<-t.test(…)
write.table(p.value,file=outputfile)



To run the new function


• Initialize the new function
- Select code and run
- Each time you restart R
- No output
• Run
doMyAnalysis(“input_1.txt”,”output_1.txt”)
doMyAnalysis(“input_2.txt”,”output_2.txt”)
doMyAnalysis(“input_3.txt”,”output_3.txt”)




Further automation


• Using a list object
- Consists of other objects (elements)
- Individual elements accessed by double square brackets
list.object[[i]]
• Here
- Put input files and/or output files in a list
- 1 list-objects, containing the 3 input-dataframes




Combine list and for-loop


• First create empty list
Mylist<-vector(“list”,n.elements)



• Read in the 3 inputfiles

4

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller paulienmeulemeester. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.18. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

67474 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$7.18
  • (0)
  Add to cart