Summary of all the lectures + practicals for Biosystems Data Analysis. It includes 4 lectures and all slides/videos/question hours belonging to those lectures. There are also notes/screenshots of some of my answers from the practicals.
BE AWARE: this is only the content of the last two weeks, thus...
Biosystems Data Analysis
Table of Content
Week 3 ............................................................................................................................................................. 2
E ................................................................................................................................................................... 2
Lecture 5 ANOVA-Simultaneous Component Analysis – ASCA ......................................................... 2
ASCA – Smilde et.al................................................................................................................................... 5
R practical ................................................................................................................................................. 6
F.................................................................................................................................................................... 9
Lecture 6 Statistical Validation and Biomarker Selection ................................................................. 9
Smit ACA 2007 – paper ........................................................................................................................... 15
PLSDA cross validation – Johan et.al....................................................................................................... 15
R practical ............................................................................................................................................... 16
G ................................................................................................................................................................. 20
Lecture 7 Metabolic Network Inference ......................................................................................... 20
R practical ............................................................................................................................................... 24
Week 3 ........................................................................................................................................................... 26
H ................................................................................................................................................................. 26
Lecture 8 Microbiome data analysis ............................................................................................... 26
Normalizing Microbiome Data – McKnight et.al..................................................................................... 32
R practical ............................................................................................................................................... 33
1
,Week 3
E
In omics research it is increasingly common to analyse designed data. Those are data obtained when an
experimental design is underlying the study, such as treatment groups and/or time. This generates a certain
structure in the data and visualizing and investigating such data with PCA is no longer optimal. ASCA is the
preferred method and this will be explained starting with ordinary analysis-of-variance (ANOVA). Please
study the first ASCA publication Smilde2005.pdf which will also be used in the lecture.
Web lecture link: https://webcolleges.uva.nl/Mediasite/Play/cd48b0872da64a64ae869f681a7b99231d
Lecture 5 ANOVA-Simultaneous Component Analysis – ASCA
ANOVA: Analysis of Variance
Idea: is the difference between m1, m2 and mNO large enough
relative to within spread?
Goal: separate between sources of variation.
Use of ANOVA:
- To look for differences between groups
- To test the effect of a treatment
Assumptions of ANOVA:
- Replicates in a group are normally distributed. Or a log transform.
- The variance within groups are equal. Within a group/cell the variability across replicates is the
same.
One-way ANOVA notation: yik
Factor with levels (groups) I = 1, …, I the thing you change, it has different levels .
Replicates k = 1, …, K
The number of replicates is the same within groups (balanced designs).
𝒚𝒊𝒌 = 𝝁 + 𝜶𝒊 + 𝜺𝒊𝒌 ; 𝜺𝒊𝒌 ~𝑵(𝟎, 𝝈𝟐 ) 𝝁 = overall mean 𝜶𝒊 = effect of factor (level i)
So your ‘measured plant’ with treatment i and replicate k is yik. Systematic variation: 𝝁 + 𝜶𝒊 .
Deviations are relative to 𝝁. Centered around 0, thus ∑𝛼𝑖 = 0
𝜺𝒊𝒌 : the residuals; everything you cannot explain with 𝜇 and 𝛼𝑖 . Un-systematic variation (~ random).
Estimate of the one-way ANOVA:
𝑦𝑖𝑘 = 𝑦.. + (𝑦𝑖. − 𝑦.. ) + (𝑦𝑖𝑘 − 𝑦𝑖. ) = mean + between + individual, within
∑𝐼𝑖=1 ∑𝐾
𝑘=1 𝑦𝑖𝑘
𝑦.. = (each . / dot instead of i, j or k means you take the average
𝐼𝐾
∑𝐾
𝑘=1 𝑦𝑖𝑘
𝑦𝑖. = of the according variable). Thus y.. = overall mean.
𝐾
Sum of squares (SS): ∑𝑖 ∑𝑘(𝑦𝑖𝑘 − 𝑦.. )2 = ∑𝑖 ∑𝑘(𝑦𝑖. − 𝑦.. )2 + ∑𝑖 ∑𝑘(𝑦𝑖𝑘 − 𝑖.. )2
∑𝑖 ∑𝑘(𝑦𝑖𝑘 − 𝑦.. )2 = 𝐾 ∑𝑖(𝑦𝑖. − 𝑦.. )2 + ∑𝑖 ∑𝑘(𝑦𝑖𝑘 − 𝑖.. )2
Total SS = between SS + within SS
The cross-product vanishes because of orthogonality of the design.
Two-way ANOVA:
Two treatments A and B
- Is there an overall treatment effect?
- Is there an effect of treatment A?
- Is there an effect of treatment B?
- Is there an interaction effect?
2
,The design of two-way ANOVA:
- Special case: equal sample size
- I different fixed treatments Aj (i = treatment 1)
- J different fixed treatments Bj (j = treatment 2)
- K replications of each combination (Aj, Bj)
- Total I * J * K items yijk i indicates rows; j indicates columns; k indicates replicates/individuals.
𝒚𝒊𝒋𝒌 = 𝝁 + 𝜶𝒊 + 𝜷𝒋 + (𝜶𝜷)𝒊𝒋 + 𝜺𝒊𝒋𝒌 ; 𝜺𝒊𝒋𝒌~𝑵(𝟎, 𝝈𝟐 ) → linear model
= mean + main effect factor A + main effect factor B + interaction factor (A,B) + individual residual.
Sum of squares similar to one-way ANOVA: 𝑆𝑆𝑇𝑜𝑡𝑎𝑙 = 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛𝐴 + 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛𝐵 + 𝑆𝑆𝐼𝑛𝑡 + 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛
Again cross-product vanishes because of orthogonality of the design. Linear additivity!
Open issues:
- statistical tests under construction
- unbalanced design
- random effects
- special designs (repeated measures, cross-over, split-plot)
You build an ANOVA per metabolite. Then for each metabolite you run the ANOVA model and take
estimates and put them next to each other, called bookkeeping. Then you run PCA on the separate
matrices related to treatments.
ASCA
Example exam question:
In a toxicological study, investigators want to test the toxicity of a compound in rats. The idea
is that the toxic compound will affect the metabolism of the rat depending on the dose of the
compound. After having administered a single dose at a certain point in time, the effect is
expected to become visual gradually in the urine metabolites of the rats which will be
measured by an instrumental method. You are asked to design this study for the investigator
and we want to analyze the resulting data with ASCA.
a) Which factors would you choose for the design and at which levels would you vary
those?
b) Would you include replicates in the design and if so, why?
variances within a group).
b) Yes, include replicates to check on assumptions of ANOVA (normality and equal
and a factor time at multiple levels (including a before dosage time point.)
a) Include different dosage regimes (at least four levels; none, low, medium and high)
Answers
4
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper lenie22. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €3,99. Je zit daarna nergens aan vast.