Lecture summary of the course Statistics 2 (MAT) at Wageningen University (WUR). Slides included as examples to give an extensive overview. Combination of Dutch and English.
Tutorial 1 – recap of most important elements
Acrylamide example: step by step how to identify your research
RQ: How much is the acrylamide content (micrograms/gram) of baked potatoes and what is the relationship
between acrylamide and other quality features?
What is the target population? All households that bake potatoes in the Netherlands.
Units? Households (that bake potatoes)
Sample: selection of units from the population (SRS). Sample units from the population you do measurements
on.
Variable: property of an unit from the sample that we’re actually going to measure
- Quantitative: discrete (number of e.g. households) or continuous (value with decimals)
- Qualitative: nominal (no natural order), ordinal ()
Visualization of quantitative variables:
Histogram for quantitative variables (continuous and discrete with a large # of outcomes). Histogram can help
to go towards a normal distribution. If we have loads of observations, the class width of a histogram becomes
smaller turns into a curve (probability density function (pdf)).
Standard normal distribution:
A standard normal distribution always has a μ = 0 and σ = 1. It is distributed as: Z ~ N(0,1). Z is used to indicate
the standard normal distribution. Table 1 (O&L) shows all areas under the standard ND.
Transformation to a standard normal distribution:
Transformation: y ~ N(μ, σ) Z ~ N(0, 1)
Y=μ+z*σ
Z (z-score of y) = (y- μ) / σ
Example: y ~ N(175, 7.5)
P(y ≤ 165) = 165 – .5 = Z score of -1.33
-1.33 has a probability of 0.0918
Example 2: y ~ N(35, 7)
P(y > 40) = 40 – = Z score of 0.71
0.71 has a probability of 0.7611, though we need to know the right side: 1 – 0.7611 = 0.2389
Q-Q Plot:
Are observations normally distributed?
Normal quantile – quantile plot: check whether the observations (dots) are close to the straight line.
If the dots are roughly close to the straight line, my observations are coming from a normal distribution. Not
close to the straight line: not normally distributed.
Sampling distribution of sample mean:
1. Central Limit Theorem:
If you have drawings from a normal distribution, with a certain
population mean and standard deviation, you take n drawings from it. From that you can calculate the sample
mean and the sample standard deviation. The bigger the sample size n, the more narrow the distribution
becomes.
,2. Central Limit Theorem
If you have random drawings from any other distribution with a population mean of mu and a population
standard deviation of sigma, the distribution of the sample mean can be approximated by the same normal
distribution. Irrespective of the type of the ‘old’ distribution.
Under two conditions:
1) Sample size is large enough
2) New distribution will be normally distributed
The same holds for a sum series of observations. The sum of y will approximate a
normal distribution with 1) an expected value of n * expected value of a single drawing 2) sigma n * population
standard deviation of a single drawing. These are under the same 2 conditions. In the end, you can conclude
with saying something about the (sample) mean content.
Tutorial 2 – confidence interval
Is our variable actually normally distributed? Make a Q-Q plot. Mid-region is quite normal, with slight deviation on the
edges: therefore it is normally distributed.
European norm = 50 mg/l (average)
Outcome 41 observations:
Sample mean = 55.7
Sample standard deviation = 30.3 SD is large, meaning: there is quite some variation in measurements.
Confidence interval: method that shows the uncertainty of the estimated mean.
Estimator: needs accuracy and precision as measurement instruments.
If there’s low accuracy, may be biased measurements: weighing scale that is always -1kg of the true
measurement. Unbiased estimator would be better: reflects the truth of the measurements.
Low precision means the measurements are spread around. High precision shows little spread.
High accuracy and high precision is best: reflects the truth in reality with only little spread.
Example expected value based on sample mean:
Random sample of independent observations with population mean and standard deviation.
Aim is to find out the expected value μy based on the sample mean through the estimator formula.
y̅ is an unbiased estimator for μy, because: μy̅ = μy
We now want to know an unbiased estimator with high precision for the
spread. Standard deviation of a single observation, divided by square root of nr observations
Rule of thumb: if sample size increases, the spread becomes smaller. We reach our target:
High precision, little spread and unbiased by measuring the true values.
In sum:
So, y̅ is a consistent estimator for μy. Has to do with the law of large numbers: the larger the sample, the closer we tend to
the unknown true value of μy.
, Implementation: doing the measurements calculating the sample mean accurate with high precision estimate for
true value of μy
Confidence interval for population mean (expected value)
Outcome of y̅ is only estimating one point for the true value of μy & how precise is that one point? Confidence Interval (CI)
gives us a certain spread about the measurement and tells you how close you are to the true value.
Confidence intervals always given as: estimator and a certain error margin.
α = level of significance alpha reflects the probability of taking an error giving wrong result (degree of mistrust)
Confidence coefficient: 1 – α reflects the degree of trust. A confidence interval of e.g. 95% means: the procedure of
building a confidence interval always in 95% of the cases leads to correct statements. Meaning the true value is within
this confidence interval. Higher confidence means that more often the true value is within this interval.
Or: CI contains 95% of the time the true value that we want to estimate. There’s always a probability that it is not
enclosed within this 95%.
Or: the probability that the CI contains the unknown parameter μy = 0.95
All equivalent statements.
As an example, with a CI of 0.9 it means with 100 observations that approx. 10 observations will not be close to the true
value and will be ‘excluded’ from the confidence interval.
Find confidence interval from the z-table WHEN normally distributed:
Rule of thumb: (μ - 2σ or μ + 2σ) approx. 95% of the observations are obtained within this interval.
Shows 1 – α = 0.95 right-tail p= 0.025. With 95% confidence, there’s 2.5% on each side left of the distribution.
Can be looked up more precisely in table 2 in the back of your book: Bottom line (df = inf.) z-score given as 1.960,
which is approx. 2! So right side is 1.960 and the left side is -1.960
Central limit theorem to find out distribution of the sample mean:
So the z-score gives 95% confidence.
Z-score = sample mean – population mean (of single drawing). Divided by
the width of the sample sample standard deviation
P(-1.96 ≤ Z ≤ 1.96) = 0.95 substitute z with formula to find the true value (sample mean)
By this formula you get the error margin low the lower end and the error margin on the top end.
Given our y bar and sigma y we can give an area where the truth is lying with 95% probability.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Nerine. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $3.21. You're not tied to anything after your purchase.