100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Practical 3: Advanced Data Analysis: full summary + explanations $6.49   Add to cart

Summary

Practical 3: Advanced Data Analysis: full summary + explanations

1 review
 97 views  12 purchases
  • Course
  • Institution

This document includes all solutions of practical 3 data analysis, including the used code, explanations and screenshots.

Preview 3 out of 19  pages

  • May 10, 2022
  • 19
  • 2021/2022
  • Summary

1  review

review-writer-avatar

By: roberto777torres • 5 months ago

avatar-seller
Part 1. Automation in R : loops, lists and functions

Introduction
The folder contains the 4 datasets with results of a study into the genetic causes of age-related
hearing impairment (ARHI) : connexins.txt, kcnq4.txt, oxidStress.txt and monogenic.txt.
ARHI is the gradual decline of hearing with ageing. In some persons this decline is worse
than in others, and there are reasons to believe that part of this variation between individuals
is genetic. Unlike monogenic disorders, where one single gene causes a disease phenotype,
ARHI is complex – due to multiple genes with a small effect, plus environmental risk factors.
In the ARHI study, we searched for genetic variants (single nucleotide polymorphisms, SNPs)
that were associated with the hearing phenotype. If a gene contains several SNPs showing a
strong association with the hearing phenotype, this can indicate a role of this gene in hearing
impairment.
The statistical analysis involves an association test between the phenotype and the genotype.
The phenotype is described by the Z-score. This is an age- and gender corrected score that
describes how well a person can hear, given age and gender. Good-hearing persons have a
low (negative) Z-score, whereas bad hearing persons have a high (positive) Z-score. The Z-
score is a numeric variable with an approximately normal distribution.
Each SNP has two alleles and, hence, 3 genotypes: aa, ab and bb. Association tests are
commonly performed in two ways:
1) using an ANOVA : treating the 3 genotypes as 3 completely distinct categories
(with no ordering)
2) using regression, coding the genotypes like this : aa=0 , ab=1 and bb=2. This
analysis treats the heterozygous as the intermediate between the two homozygous
genotypes. This is called testing under an additive model. :




Automation 1: Looping
Read in the dataset connexins.txt. This dataset contains genotyping results from SNPs within
several genes of the connexin family (gap junction proteins expressed in the inner ear.).

First: setwd(“/”)
inputData <- read.table("connexins.txt", header=TRUE, sep="\t",
na.strings=c("0",""),stringsAsFactors=TRUE)


The latter argument specifies that missing values are indicated by either an empty field ("") or
a zero. (Yes, R allows more than one missing value indicator).
How does the dataset look like?

str(inputData)




1

,The first column is the subject-identifier (ID). The Zscore is the numeric phenotype. The
remainder of the columns represent the genotypes of the SNPs of interest. They have been
read in as factors (=categorical variables), with 3 levels.

When we test the first SNP (Cx26_SNP1) for association with the phenotype (Zscore), we can
use the ANOVA or the regression approach. Remember that both are carried out by the lm
function in R, and the type of X-variable determines whether an ANOVA or regression is
performed:

For ANOVA, with genotype categorical :

modelANOVA<-lm(Zscore~Cx26_SNP1,data=inputData)
summary(modelANOVA)




p.anova <- anova(modelANOVA)[1,5]




2

, The last command retrieves the overall p-value. If we want the regression approach, we need
to convert the genotype to a number (1,2,3). In R, we can automatically change the categorical
genotypes aa, ab, bb to numbers using the co-ercion formula as.numeric(). The lm function
will then carry out regression:

modelRegr<-lm(Zscore~as.numeric(Cx26_SNP1),data=inputData)
summary(modelRegr)




p.regres <- anova(modelRegr)[1,5]




This gives us the association test for the first SNP. However, the study involved genotyping
several SNPs – nowadays such studies involve up to a million SNPs. We therefore need to
automate the analysis.

If we want to run the above code for all SNPs, the only argument that changes is the
independent variable (after the tilde). The Z-score and the rest of the code will always be the
same. First we create a dataframe with only the SNPs. That is, the original with the first two
columns removed.

allSNPs<-inputData[,-c(1,2)]


In the allSNPs dataframe, the ith column contains the genotype from the ith SNP. How many
SNPs are these…? The dim() function gives us the dimensions of the dataframe : the number
of rows and columns. The SNP names can be retrieved using the names() function.

n.columns<-dim(inputData)[2]
n.snps<-n.columns-2
SNPnames<-names(allSNPs)




3

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Bi0med. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $6.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

80796 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$6.49  12x  sold
  • (1)
  Add to cart