100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Practical 3: Advanced Data Analysis: full summary + explanations €5,98   In winkelwagen

Samenvatting

Practical 3: Advanced Data Analysis: full summary + explanations

1 beoordeling
 97 keer bekeken  12 keer verkocht

This document includes all solutions of practical 3 data analysis, including the used code, explanations and screenshots.

Voorbeeld 3 van de 19  pagina's

  • 10 mei 2022
  • 19
  • 2021/2022
  • Samenvatting
Alle documenten voor dit vak (19)

1  beoordeling

review-writer-avatar

Door: roberto777torres • 5 maanden geleden

avatar-seller
Bi0med
Part 1. Automation in R : loops, lists and functions

Introduction
The folder contains the 4 datasets with results of a study into the genetic causes of age-related
hearing impairment (ARHI) : connexins.txt, kcnq4.txt, oxidStress.txt and monogenic.txt.
ARHI is the gradual decline of hearing with ageing. In some persons this decline is worse
than in others, and there are reasons to believe that part of this variation between individuals
is genetic. Unlike monogenic disorders, where one single gene causes a disease phenotype,
ARHI is complex – due to multiple genes with a small effect, plus environmental risk factors.
In the ARHI study, we searched for genetic variants (single nucleotide polymorphisms, SNPs)
that were associated with the hearing phenotype. If a gene contains several SNPs showing a
strong association with the hearing phenotype, this can indicate a role of this gene in hearing
impairment.
The statistical analysis involves an association test between the phenotype and the genotype.
The phenotype is described by the Z-score. This is an age- and gender corrected score that
describes how well a person can hear, given age and gender. Good-hearing persons have a
low (negative) Z-score, whereas bad hearing persons have a high (positive) Z-score. The Z-
score is a numeric variable with an approximately normal distribution.
Each SNP has two alleles and, hence, 3 genotypes: aa, ab and bb. Association tests are
commonly performed in two ways:
1) using an ANOVA : treating the 3 genotypes as 3 completely distinct categories
(with no ordering)
2) using regression, coding the genotypes like this : aa=0 , ab=1 and bb=2. This
analysis treats the heterozygous as the intermediate between the two homozygous
genotypes. This is called testing under an additive model. :




Automation 1: Looping
Read in the dataset connexins.txt. This dataset contains genotyping results from SNPs within
several genes of the connexin family (gap junction proteins expressed in the inner ear.).

First: setwd(“/”)
inputData <- read.table("connexins.txt", header=TRUE, sep="\t",
na.strings=c("0",""),stringsAsFactors=TRUE)


The latter argument specifies that missing values are indicated by either an empty field ("") or
a zero. (Yes, R allows more than one missing value indicator).
How does the dataset look like?

str(inputData)




1

,The first column is the subject-identifier (ID). The Zscore is the numeric phenotype. The
remainder of the columns represent the genotypes of the SNPs of interest. They have been
read in as factors (=categorical variables), with 3 levels.

When we test the first SNP (Cx26_SNP1) for association with the phenotype (Zscore), we can
use the ANOVA or the regression approach. Remember that both are carried out by the lm
function in R, and the type of X-variable determines whether an ANOVA or regression is
performed:

For ANOVA, with genotype categorical :

modelANOVA<-lm(Zscore~Cx26_SNP1,data=inputData)
summary(modelANOVA)




p.anova <- anova(modelANOVA)[1,5]




2

, The last command retrieves the overall p-value. If we want the regression approach, we need
to convert the genotype to a number (1,2,3). In R, we can automatically change the categorical
genotypes aa, ab, bb to numbers using the co-ercion formula as.numeric(). The lm function
will then carry out regression:

modelRegr<-lm(Zscore~as.numeric(Cx26_SNP1),data=inputData)
summary(modelRegr)




p.regres <- anova(modelRegr)[1,5]




This gives us the association test for the first SNP. However, the study involved genotyping
several SNPs – nowadays such studies involve up to a million SNPs. We therefore need to
automate the analysis.

If we want to run the above code for all SNPs, the only argument that changes is the
independent variable (after the tilde). The Z-score and the rest of the code will always be the
same. First we create a dataframe with only the SNPs. That is, the original with the first two
columns removed.

allSNPs<-inputData[,-c(1,2)]


In the allSNPs dataframe, the ith column contains the genotype from the ith SNP. How many
SNPs are these…? The dim() function gives us the dimensions of the dataframe : the number
of rows and columns. The SNP names can be retrieved using the names() function.

n.columns<-dim(inputData)[2]
n.snps<-n.columns-2
SNPnames<-names(allSNPs)




3

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

√  	Verzekerd van kwaliteit door reviews

√ Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Bi0med. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,98. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 60904 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€5,98  12x  verkocht
  • (1)
  Kopen