College aantekeningen

MAT-15403 Lectures Statistics 2

1 keer verkocht

Vak
Statistics 2 (MAT15403)

Instelling
Wageningen University (WUR)

Lecture summary of the course Statistics 2 (MAT) at Wageningen University (WUR). Slides included as examples to give an extensive overview. Combination of Dutch and English.

[Meer zien]

Voorbeeld 3 van de 23 pagina's

Bekijk voorbeeld

Geupload op 9 september 2021
Aantal pagina's 23
Geschreven in 2020/2021
Type College aantekeningen
Docent(en) Boer
Bevat Alle colleges

statistics

Volgen

Nerine Lid sinds 8 jaar 74 documenten verkocht

€2,99

Ook beschikbaar in voordeelbundel v.a. €3,49

In winkelwagen

Opslaan

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Ook beschikbaar in voordeelbundel (1)

Statistics 1 and 2 Lectures

€ 5,98 € 3,49

2x verkocht

2 items

1. College aantekeningen - Mat-15303 lectures statistics 1
2. College aantekeningen - Mat-15403 lectures statistics 2
Meer zien

Tutorial 1 – recap of most important elements
Acrylamide example: step by step how to identify your research

RQ: How much is the acrylamide content (micrograms/gram) of baked potatoes and what is the relationship
between acrylamide and other quality features?

What is the target population? All households that bake potatoes in the Netherlands.
Units? Households (that bake potatoes)

Sample: selection of units from the population (SRS). Sample units from the population you do measurements
on.

Variable: property of an unit from the sample that we’re actually going to measure
- Quantitative: discrete (number of e.g. households) or continuous (value with decimals)
- Qualitative: nominal (no natural order), ordinal ()

Visualization of quantitative variables:
Histogram for quantitative variables (continuous and discrete with a large # of outcomes). Histogram can help
to go towards a normal distribution. If we have loads of observations, the class width of a histogram becomes
smaller  turns into a curve (probability density function (pdf)).

Standard normal distribution:
A standard normal distribution always has a μ = 0 and σ = 1. It is distributed as: Z ~ N(0,1). Z is used to indicate
the standard normal distribution. Table 1 (O&L) shows all areas under the standard ND.

Transformation to a standard normal distribution:
Transformation: y ~ N(μ, σ)  Z ~ N(0, 1)
Y=μ+z*σ
Z (z-score of y) = (y- μ) / σ

Example: y ~ N(175, 7.5)
P(y ≤ 165) = 165 – .5 = Z score of -1.33
-1.33 has a probability of 0.0918

Example 2: y ~ N(35, 7)
P(y > 40) = 40 – = Z score of 0.71
0.71 has a probability of 0.7611, though we need to know the right side: 1 – 0.7611 = 0.2389

Q-Q Plot:
Are observations normally distributed?
Normal quantile – quantile plot: check whether the observations (dots) are close to the straight line.
If the dots are roughly close to the straight line, my observations are coming from a normal distribution. Not
close to the straight line: not normally distributed.

Sampling distribution of sample mean:
1. Central Limit Theorem:
If you have drawings from a normal distribution, with a certain
population mean and standard deviation, you take n drawings from it. From that you can calculate the sample
mean and the sample standard deviation. The bigger the sample size n, the more narrow the distribution
becomes.

,2. Central Limit Theorem
If you have random drawings from any other distribution with a population mean of mu and a population
standard deviation of sigma, the distribution of the sample mean can be approximated by the same normal
distribution. Irrespective of the type of the ‘old’ distribution.

Under two conditions:
1) Sample size is large enough
2) New distribution will be normally distributed

The same holds for a sum series of observations. The sum of y will approximate a
normal distribution with 1) an expected value of n * expected value of a single drawing 2) sigma n * population
standard deviation of a single drawing. These are under the same 2 conditions. In the end, you can conclude
with saying something about the (sample) mean content.

Tutorial 2 – confidence interval
Is our variable actually normally distributed?  Make a Q-Q plot. Mid-region is quite normal, with slight deviation on the
edges: therefore it is normally distributed.

European norm = 50 mg/l (average)
Outcome 41 observations:
Sample mean = 55.7
Sample standard deviation = 30.3 SD is large, meaning: there is quite some variation in measurements.

Confidence interval: method that shows the uncertainty of the estimated mean.

Estimator: needs accuracy and precision as measurement instruments.
If there’s low accuracy, may be biased measurements: weighing scale that is always -1kg of the true
measurement. Unbiased estimator would be better: reflects the truth of the measurements.
Low precision means the measurements are spread around. High precision shows little spread.
High accuracy and high precision is best: reflects the truth in reality with only little spread.

Example expected value based on sample mean:
Random sample of independent observations with population mean and standard deviation.
Aim is to find out the expected value μy based on the sample mean through the estimator formula.

y̅ is an unbiased estimator for μy, because: μy̅ = μy

We now want to know an unbiased estimator with high precision for the
spread. Standard deviation of a single observation, divided by square root of nr observations

Rule of thumb: if sample size increases, the spread becomes smaller. We reach our target:
High precision, little spread and unbiased by measuring the true values.
In sum:
So, y̅ is a consistent estimator for μy. Has to do with the law of large numbers: the larger the sample, the closer we tend to
the unknown true value of μy.

, Implementation: doing the measurements  calculating the sample mean  accurate with high precision estimate for
true value of μy

Confidence interval for population mean (expected value)
Outcome of y̅ is only estimating one point for the true value of μy & how precise is that one point? Confidence Interval (CI)
gives us a certain spread about the measurement and tells you how close you are to the true value.

Confidence intervals always given as: estimator and a certain error margin.

α = level of significance alpha reflects the probability of taking an error  giving wrong result (degree of mistrust)
Confidence coefficient: 1 – α reflects the degree of trust. A confidence interval of e.g. 95% means: the procedure of
building a confidence interval always in 95% of the cases leads to correct statements. Meaning the true value is within
this confidence interval. Higher confidence means that more often the true value is within this interval.

Or: CI contains 95% of the time the true value that we want to estimate. There’s always a probability that it is not
enclosed within this 95%.
Or: the probability that the CI contains the unknown parameter μy = 0.95
 All equivalent statements.

As an example, with a CI of 0.9 it means with 100 observations that approx. 10 observations will not be close to the true
value and will be ‘excluded’ from the confidence interval.

Find confidence interval from the z-table WHEN normally distributed:
Rule of thumb: (μ - 2σ or μ + 2σ)  approx. 95% of the observations are obtained within this interval.
Shows 1 – α = 0.95  right-tail p= 0.025. With 95% confidence, there’s 2.5% on each side left of the distribution.

Can be looked up more precisely in table 2 in the back of your book: Bottom line (df = inf.)  z-score given as 1.960,
which is approx. 2! So right side is 1.960 and the left side is -1.960

Central limit theorem to find out distribution of the sample mean:
So the z-score gives 95% confidence.

Z-score = sample mean – population mean (of single drawing). Divided by
the width of the sample  sample standard deviation
P(-1.96 ≤ Z ≤ 1.96) = 0.95  substitute z with formula to find the true value (sample mean)

By this formula you get the error margin low the lower end and the error margin on the top end.
Given our y bar and sigma y we can give an area where the truth is lying with 95% probability.

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Nerine. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €2,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 66184 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Begin nu gratis

College aantekeningen

MAT-15403 Lectures Statistics 2

Document informatie

Onderwerpen

Geschreven voor

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud