Summary of all the lectures + practicals for Biosystems Data Analysis. It includes 4 lectures and all slides/videos/question hours belonging to those lectures. There are also notes/screenshots of some of my answers from the practicals.
BE AWARE: this is only the content of the last two weeks, thus...
Biosystems Data Analysis
Table of Content
Week 3 ............................................................................................................................................................. 2
E ................................................................................................................................................................... 2
Lecture 5 ANOVA-Simultaneous Component Analysis – ASCA ......................................................... 2
ASCA – Smilde et.al................................................................................................................................... 5
R practical ................................................................................................................................................. 6
F.................................................................................................................................................................... 9
Lecture 6 Statistical Validation and Biomarker Selection ................................................................. 9
Smit ACA 2007 – paper ........................................................................................................................... 15
PLSDA cross validation – Johan et.al....................................................................................................... 15
R practical ............................................................................................................................................... 16
G ................................................................................................................................................................. 20
Lecture 7 Metabolic Network Inference ......................................................................................... 20
R practical ............................................................................................................................................... 24
Week 3 ........................................................................................................................................................... 26
H ................................................................................................................................................................. 26
Lecture 8 Microbiome data analysis ............................................................................................... 26
Normalizing Microbiome Data – McKnight et.al..................................................................................... 32
R practical ............................................................................................................................................... 33
1
,Week 3
E
In omics research it is increasingly common to analyse designed data. Those are data obtained when an
experimental design is underlying the study, such as treatment groups and/or time. This generates a certain
structure in the data and visualizing and investigating such data with PCA is no longer optimal. ASCA is the
preferred method and this will be explained starting with ordinary analysis-of-variance (ANOVA). Please
study the first ASCA publication Smilde2005.pdf which will also be used in the lecture.
Web lecture link: https://webcolleges.uva.nl/Mediasite/Play/cd48b0872da64a64ae869f681a7b99231d
Lecture 5 ANOVA-Simultaneous Component Analysis – ASCA
ANOVA: Analysis of Variance
Idea: is the difference between m1, m2 and mNO large enough
relative to within spread?
Goal: separate between sources of variation.
Use of ANOVA:
- To look for differences between groups
- To test the effect of a treatment
Assumptions of ANOVA:
- Replicates in a group are normally distributed. Or a log transform.
- The variance within groups are equal. Within a group/cell the variability across replicates is the
same.
One-way ANOVA notation: yik
Factor with levels (groups) I = 1, …, I the thing you change, it has different levels .
Replicates k = 1, …, K
The number of replicates is the same within groups (balanced designs).
𝒚𝒊𝒌 = 𝝁 + 𝜶𝒊 + 𝜺𝒊𝒌 ; 𝜺𝒊𝒌 ~𝑵(𝟎, 𝝈𝟐 ) 𝝁 = overall mean 𝜶𝒊 = effect of factor (level i)
So your ‘measured plant’ with treatment i and replicate k is yik. Systematic variation: 𝝁 + 𝜶𝒊 .
Deviations are relative to 𝝁. Centered around 0, thus ∑𝛼𝑖 = 0
𝜺𝒊𝒌 : the residuals; everything you cannot explain with 𝜇 and 𝛼𝑖 . Un-systematic variation (~ random).
Estimate of the one-way ANOVA:
𝑦𝑖𝑘 = 𝑦.. + (𝑦𝑖. − 𝑦.. ) + (𝑦𝑖𝑘 − 𝑦𝑖. ) = mean + between + individual, within
∑𝐼𝑖=1 ∑𝐾
𝑘=1 𝑦𝑖𝑘
𝑦.. = (each . / dot instead of i, j or k means you take the average
𝐼𝐾
∑𝐾
𝑘=1 𝑦𝑖𝑘
𝑦𝑖. = of the according variable). Thus y.. = overall mean.
𝐾
Sum of squares (SS): ∑𝑖 ∑𝑘(𝑦𝑖𝑘 − 𝑦.. )2 = ∑𝑖 ∑𝑘(𝑦𝑖. − 𝑦.. )2 + ∑𝑖 ∑𝑘(𝑦𝑖𝑘 − 𝑖.. )2
∑𝑖 ∑𝑘(𝑦𝑖𝑘 − 𝑦.. )2 = 𝐾 ∑𝑖(𝑦𝑖. − 𝑦.. )2 + ∑𝑖 ∑𝑘(𝑦𝑖𝑘 − 𝑖.. )2
Total SS = between SS + within SS
The cross-product vanishes because of orthogonality of the design.
Two-way ANOVA:
Two treatments A and B
- Is there an overall treatment effect?
- Is there an effect of treatment A?
- Is there an effect of treatment B?
- Is there an interaction effect?
2
,The design of two-way ANOVA:
- Special case: equal sample size
- I different fixed treatments Aj (i = treatment 1)
- J different fixed treatments Bj (j = treatment 2)
- K replications of each combination (Aj, Bj)
- Total I * J * K items yijk i indicates rows; j indicates columns; k indicates replicates/individuals.
𝒚𝒊𝒋𝒌 = 𝝁 + 𝜶𝒊 + 𝜷𝒋 + (𝜶𝜷)𝒊𝒋 + 𝜺𝒊𝒋𝒌 ; 𝜺𝒊𝒋𝒌~𝑵(𝟎, 𝝈𝟐 ) → linear model
= mean + main effect factor A + main effect factor B + interaction factor (A,B) + individual residual.
Sum of squares similar to one-way ANOVA: 𝑆𝑆𝑇𝑜𝑡𝑎𝑙 = 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛𝐴 + 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛𝐵 + 𝑆𝑆𝐼𝑛𝑡 + 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛
Again cross-product vanishes because of orthogonality of the design. Linear additivity!
Open issues:
- statistical tests under construction
- unbalanced design
- random effects
- special designs (repeated measures, cross-over, split-plot)
You build an ANOVA per metabolite. Then for each metabolite you run the ANOVA model and take
estimates and put them next to each other, called bookkeeping. Then you run PCA on the separate
matrices related to treatments.
ASCA
Example exam question:
In a toxicological study, investigators want to test the toxicity of a compound in rats. The idea
is that the toxic compound will affect the metabolism of the rat depending on the dose of the
compound. After having administered a single dose at a certain point in time, the effect is
expected to become visual gradually in the urine metabolites of the rats which will be
measured by an instrumental method. You are asked to design this study for the investigator
and we want to analyze the resulting data with ASCA.
a) Which factors would you choose for the design and at which levels would you vary
those?
b) Would you include replicates in the design and if so, why?
variances within a group).
b) Yes, include replicates to check on assumptions of ANOVA (normality and equal
and a factor time at multiple levels (including a before dosage time point.)
a) Include different dosage regimes (at least four levels; none, low, medium and high)
Answers
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller lenie22. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $4.23. You're not tied to anything after your purchase.