An Introduction to Statistical Methods and Data Analysis
This document contains notes from all the lectures (1 to 12), some interesting notes and tips from the computer practicals, as well as notes from the pen and paper practicals (PPP). Information from some knowledge clips are included already in the notes.
R output is included, so it is easier to kn...
It contains everything you need to know in short, but detailed alinea's.
Seller
Follow
louise_s
Reviews received
Content preview
Lecture 1a – advanced statistics
Main aim: Inference (= draw conclusions about a population or about a general phenomenon based on a
limited number of observations, which are the sample data)
3 different situations for t-procedures (confidence interval and t-tests):
- one sample, one mean (e.g. the mean body weight of all 6 years old boys in the NL)
- paired observations, mean difference (e.g. data of twins or before and after study)
- two independent samples, difference in mean (e.g. two populations: difference in exam scores in
males and females, which is a typical observational study/ research)
1. Inference (1 sample)
Take a random sample (sample data which is representative for the whole population). The noise is different
for each sample data, but some noise makes them a bit different.
à Conclusions of inference are partly based on ‘noise’, introducing a level of uncertainty in the conclusions.
That is why we do tests with ‘significance level α’ and have 0.95 confidence intervals (necessary for the
uncertainty that the random samples take)
2. Confidence intervals
1) Explain what a confidence interval for a parameter means
2) Specify the general pattern of a confidence interval (the 4 elements of t-procedures)
a. Parameter of interest = what you want to know, what you want to draw a conclusion from
= something that describes the population
b. Estimator (= method of estimation) – how to estimate the parameter from the data (it’s a
method, a formula)
c. Standard error of the estimator (= how certain we can be about the estimate)
d. Degrees of freedom (= in estimating the spread) for the t-distribution
3) Apply this pattern to a specific problem (calculate the limits of the interval) à know “which
situation” to apply
Situation 1 – 1 sample situation
E.g. What is the mean body hight in Wageningen students?
à answered by doing a confidence interval
Step 1: take a random sample of male students of 25 males
à draw conclusions about a large population based on the 25 observations
Sampling terminology
• We are interested in the mean of one trait (body height) in one population (e.g. all male WUR
students)
• The students are the sampling units
• The response is body height, measure per student (so the student is also the observed or
measurement unit)
• The scientist draws conclusion about the population mean (of body weight) based on one random
sample = ‘one-sample situation’ = one population, one mean
• The population is a physical population
• The type of research is observational
Parameter of interest: mean body height of all male WUR students = mu or μy with y being the height
Step 2: to determine the confidence interval, we need the summary statistics of the data set
Sample size: n=25
Sample mean: y barre = 184
Sample standard deviation: s=9 (= how variable the values are)
1
, • A confidence interval is a range of values for a parameter, a range of values for the parameter that
we have “confidence” in
• The confidence level (1- α) is often 0.95 (α is 0.05 = 5%)
• The width of a confidence interval reflects the precisions of the estimate: precise estimate = narrow
interval
• Bounds or limits of the interval are random: they depend on the units that are drawn in the sample.
• The 0.95 (1- α): the interval is constructed such that the probability that the interval will contain the
true parameter value 0.95. Imagine many repeats of the experiment. In each repeat we have new
data and a new interval. Of all these intervals, 95% will then contain the true parameter value. In
practice we only have one sample. It’s about the method and not the outcome of the confidence
interval
• A CI is typically of the form: best guess (estimate) +- error margin
E.g. Is there a difference in mean body height of male students compared to 1980 (when it was 180cm)?
à answered by doing a t-test
Situation 2 – paired data
Blood pressure change: a physician records the blood pressure before (x) and after 2 weeks (y) of medication
use for 16 patients: d = x-y (regarded as a random sample)
Q1: What is (in general, or ‘in the population’) the change in mean blood pressure after medication use (μx – μy),
or what is the mean change in blood pressure (μd) after medication use?
à μx – μy is the change in mean and μd is the mean change à the two are the same
à we make a two-sided confidence interval for μd
à parameter of interest is the difference in mean blood pressure before and after medication use μd
Q2: does mean blood pressure in the population go down after medication use? = μx –μy > 0? or μd > 0? à we
need to do a one-sample t-test
NB1: for paired data, the observations (x and y) within the pair are not independent; they belong to the same
unit and will be correlated. This ‘problem’ is solved by using the d-values (values of the differences)
NB2: If the sample would be random (in this case it was not. That’s why it’s important that they regard this
sample as random), the patients are independent units
Paired data design = 1 sample situation for d
• Patients were not randomly selected. We should check gender, age, weight... to see if the sample
may well represent the population.
2
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller louise_s. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $8.15. You're not tied to anything after your purchase.