Samenvatting

Summary Advanced biological data analysis theory and codes

Name: Advanced biological data analysis theory and codes
SKU: doc_2259392
Rating: 5.00 (1 reviews)
Author: lauravandenend

1 beoordeling

4 keer verkocht

Instelling
Katholieke Universiteit Leuven (KU Leuven)

This summary contains the theory given in the lectures and the codes used in the practical sessions. Since notes are allowed on the examen, this is al the information needed to answer the questions.

[Meer zien]

Voorbeeld 3 van de 25 pagina's

Bekijk voorbeeld

Geupload op 15 januari 2023
Aantal pagina's 25
Geschreven in 2022/2023
Type Samenvatting

1 beoordeling

Door: valentinicarlotta • 3 maanden geleden

Volgen

lauravandenend Lid sinds 2 jaar 42 documenten verkocht

€6,49

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Laura van den End

0. Introduction
Cases and variables Variance and standard deviation
Cases: sampling unit - individuals Variance: σ2 (pop variance) or s2 (sample variance)
- Average squared deviation form the mean
Response variable: dependent outcome
- Measured variable you want to explain in function Length Dev from mean Squared dev
of the predictor variables - species abundances, from mean
gene expression, mortality
5 2 4
Predictor variable: independent variable
- Measured variable to help explain variation in 2 -1 1
response variable - pH, nutrient abundance, 2 -1 1
environmental conditions, body size, age
2 0 3
Types of variables
Categorical: non-numerical, factors 3 0 2
- Exp. treatment, sex → have discrete levels
Standard deviation: σ (pop) or s (sample) = √variance
Continuous: scale
- Body size, weight, pH, concentration, time
Percentiles
Value of variable below which x% of values lie
Count: integer - e.g 25% of the data lay below the 25th percentile
- Number of offspring, species abundance - Interquartile range: range between 25th and 75th
percentile
Ordinal:
- Preference on a scale from 1-7
The normal distribution
- Common distribution for continuous data
Descriptive vs inferential statistics - Bell-shaped, symmetrical around µ= x
Descriptive statistics: describe the data - Mean µ ± 1.96 * σ includes 95% of the observations
- Mean, standard deviation, correlation coefficient - Probability density function:
- Distribution of data, histograms, box plots

Inferential statistics: make inferences about a Skewness and kurtosis
population based on a sample Skewness: measure of asymmetry of distribution - 3rd
- Testing hypotheses with statistical tests standardized moment (mean = 1st moment, standard
- Calculating confidence intervals deviation = 2nd).
- Drawing conclusions
Kurtosis: pointless of the distribution - 4 th
standardized moment.
Descriptive statistics
(arithmetic) mean
- All values summed divided by # of observations The standard normal distribution
- Not informative for multimodal or asym distribut. A normal distribution with mean 0 and standard
- Sensitive to outliers deviation 1

Median ‘Standardizing’ your data means:
- Middle value if all values are ordered - Subtracting the mean
- Better summary statistic for asym distributed data - Dividing by the st.deviation
- Not sensitive to outliers - The resulting numbers are the
‘z-scores’ of your data points
Mode
- Value that appears most frequently in a data set

Advanced biological data analysis

, Laura van den End

Inferential statistics
We want to draw general conclusions about a
population based on sample
- Sample: part of pop that you studied
- Pop: all cases you could have studied

Standard error
When we calculate a statistic of a sample (e.g. the
mean), this is an estimate of that statistic for the
population. If we would sample again, we would get
a slightly different estimate every time. The standard
error is the standard deviation of that statistic across
our different samples

This is a measure of the precision that we have in
estimating the actual population statistic. We can
actually calculate this standard error based on just a
single sample: with n = Sample size.

Standard deviation vs standard error
The standard deviation is a measure of spread in our
sample ~ higher = more variability in the data.

The standard error is a measure of precision ~ higher
= the lower confidence in the accuracy of estimate.
- More data (the higher n) = lower the SE
- Confidence intervals are based on the SE

Using statistics to test hypotheses
H0: no effect, Q: can we reject H0 → when small
change to get our data, assuming H0 is true

Types of errors
Type I error (false positive) - we reject a true H0
- This is expected to happen in 5% of the cases!
- Multiple testing increases frequency

Type II error (false negative) - don’t reject false H0
- e.g. because sample size is too low (not enough
statistical power)

Note: we never accept or confirm H0 – we only do or
do not reject it

Advanced biological data analysis

, Laura van den End

1. Linear models
Continuous predictors Testing assumptions
STEP 1: visual inspection of raw data
> plot(body.length~heavy.metal.conc, data=caterpillars) Homogeneity of variances
STEP 2: regression line VISUALLY
- Draw the line → minimize the sum of squares of >spreadLevelPlot(fit3)
the difference between a datapoint and its - high absolute residuals = far away from reg. line
prediction - Low absolute residuals = close to regression line
- OLS - ordinaire least squares regression - We want equally distance. If the blue line is more
- Resulting line is given by 2 numbers: intercept and or less straight we have no problem.
slope:
TEST
>ncvTest(fit2)
STEP 3: fit a model → gives slope and intercept - If the p value is above 0.05 OK (no significant
> fit2 <- lm(body.length~heavy.metal.conc, data = data) deviation from homogeneous variances.
> summary(fit2)
NOT OK?
STEP 4: visualize results with effect plot - Transform data
>plot(allEffects(fit4), multiline = T, confint = list (style = - See if outliers
"auto")) - Use a model that allows for non-homogeneous
variances (gls)
STEP 5: hypothesis testing
- Take the summary table
- Take our confidence level given by SE Normality of residuals
- T value (estimate divided by SE) → more extreme
= less likely to get data if H0 is true VISUALLY
hist(rstudent(fit4), probability=T, ylim=c(0,0.5),
main="Distribution of Studentized Residuals",
Categorical predictors xlab="Studentized residuals”)
- Histogram of the studentized residuals of the
2 levels model
STEP 1 + 2 + 3 + 5: same
xfit=seq(-3,3, length=100)
STEP 4: same - Create a vector of X values for the normal
- R standard: ‘treatment coding’ = 1st alphabetical as distribution from -3 to 3
the reference level
- Sum coding → mean of all levels as reference level yfit=dnorm(xfit)
- Useful if collinearity in the data lines(xfit, yfit, col=“red”,lwd=2)
- Calculate and put values for a standard normal
More than 2 levels distribution of the range of x values given above
STEP 1 + 2 + 3 + 4: same
TEST
>shapiro.test(residuals(fit4))
STEP 5: check anova table for overall effect on the
- If W > 0.9 is OK
categorical predictor with more than 2 levels
> Anova(fit4, type=“III”)
Linearity
STEP 6: post-hoc comparisons >residualPlots(fit2)
- which levels of our predictor are different from - No strong relation is OK
each other?
> emmeans(fit4, ~samp.loc) Outliers and in uential observations
> contrast(emmeans(fit4, ~samp.loc), method='pairwise', > outlierTest(fit2) > cd <- cooks.distance(fit2)
adjust=‘Tukey’) > inflobs=which(cd>1);inflobs

Advanced biological data analysis

fl

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

√ Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper lauravandenend. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 64450 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire universiteiten

Populaire hogescholen

Populaire studieboeken voor Communicatie en Taal

Populaire studieboeken voor Economie en Bedrijf

Populaire studieboeken voor Exact en Informatica

Populaire studieboeken voor Gedrag en Maatschappij

Populaire studieboeken voor Gezondheid en Geneeskunde

Populaire studieboeken voor Recht en Bestuur

Verkoper

Samenvatting

Summary Advanced biological data analysis theory and codes

Document informatie

Onderwerpen

Geschreven voor

1 beoordeling

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

√ Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?