Statistiek 3
College 1
Beschrijvende statistieken
- “Statistics is the science of collecting, organizing and interpreting numerical facts, which we call
data.”
- A&F: Statistics consists of a body of methods for obtaining and analyzing data, to:
1. Design [research studies that]
2. Describe [the data to]
3. Make inferences based on these data.
- Descriptive Statistics:
- Descriptive statistics summarize sample or population data with numbers, tables, and
graphs
- Inferential Statistics:
- Inferential statistics make predictions about population parameters, based on a (random)
sample of data.
Data, population, sample, reliability and validity
- Doing research using data:
- Population: the total set of participants, relevant for the research question
- E.g. Population parameter: average hour of self study per week of all students.
- Sample: a subset of the population about who the data is collected
- E.g. Sample statistic: average hour of self study per week of a randomly selected
sample of 800 students
- Good quality data is necessary to answer the research question:
- Reliability (Precision)
- Validity (Bias)
Beschrijvende statistieken
- Variable: measures characteristics that can differ between subjects
- Types: behavior-, stimulus-, subject-, physiological variables
- Measuring scales (NOIR):
- Categorical/qualitative
- Nominal unordered categories (Eye color, Gender)
- Ordinal ordered categories (Disagree/Neutral/Agree)
- Quantitative/numerical
- Interval: equal distance between consecutive values (°C)
- Ratio: equal distance and true zero point (K)
1
,- Range:
- Discrete: measurement unit that is indivisible (# Brothers/sisters)
- Continuous: infinitely dividable measurement unit (Body Length)
- Three dimensions are of importance:
- Central tendency
Gives information about the typical observation: mean, mode, median …
- Dispersion
Gives information about the extent to which a distribution is stretched or squeezed:
standard deviation, variance, interquartile range
- Relative position measures
- Gives information about relative positions of observations: percentile, quartile, …
Beschrijf deze twee verdelingen:
- de centrale gevoeligheid is vrijwel hetzelfde
- vb. zelfde gemiddelde
- het lijken normale verdelingen -> geen scheefheid
- de variantie van B is lager en daardoor is hij preciezer
Inferentiële statistiek
- Goal: reliable and valid statements about the population based on a sample
- Sample statistic should not differ from population parameters
- Problems:
- Sampling error - “Random sample differences”
- Sampling bias - “Bias due to selective sample”
- Response bias - “Bias due to incorrect answer”
- Non-Response bias - “Bias due to non-response (missings)”
- Solution:
- “A random (or other probability) sampling approach of sufficient size that generates data
for everyone approached, with correct responses on all items for all subjects.”
- door vaker te testen voorkom je persoonlijke fouten door omstandigheden
- 3 verdelingen:
- Population distribution
- Student proportions indicating to be (not) in need for extra math support
- Sample data distribution
- Student proportion of the sample (here n = 1000) indicating to be (not) in need for
extra math support
- Sampling distribution
2
, - The probability distribution for the sample statistic. To interpret as the result of
repetitive taking of a sample of size n (here 1000).
π (1−π ) 0.38(1−0.38)
- Standard deviation of:
√ n √
=
1000
- Standard error (σM) estimated by SEM
= 0.015
- Central Limit Theorem for sampling distribution
- Empirical rule for normal distribution
- 68% within ± 1σ of the mean
- 95% within ± 2σ of the mean
- almost 100% within ± 3σ of the mean
- als je een sample van ongeveer 30 hebt, krijg je “altijd”
een normaal verdeling onafhankelijk van de populatie-
verdeling
- Relationship between population-, sample- and sampling- distribution
Types van verdelingen
- (Standard-) normal distribution (Z-distribution)
- Sampling distribution of a proportion if H0 applies.
- (Sampling distribution of a mean if H0 applies and if the population standarddeviation is
known)
- Student’s t-distribution
- Sampling distribution of a mean if H0 applies and if the population standard
deviation is unknown.
- Sampling distribution of a regression coefficient if H0 applies.
- Chi-square distribution
- Sampling distribution for deviations of frequencies of a categorical variable if H0
applies.
Scheefheid van verdelingen
Hypothesis Testing
- Significance-test or hypothesis-test:
- Based on a sample, this test determines how strong the evidence is against a certain
3
, hypothesis, upon which a decision is made (not) to reject this hypothesis.
- 5 steps of a hypothesis test:
- Formulate expectations
- Setup hypothesis
- Calculate test-statistic (e.g. t-value)
- Determine p-value
- Draw conclusion
Type 1 en II fout
- Probability of a Type I-error (false positive) is determined by:
- The chosen significance level (α).
- Probability of a Type 2-error (false negative) is determined by:
- Effect size
- Sample size
- Variance (dispersion) in sample
- The smaller the chosen Type I-error, the larger the acquired Type 2-error, given a certain sample.
- je maakt de type 1 en II fout tegelijk
4