Statistics 1
Tutorial 1
Research question: What we want to answer.
Population: Every member of group for which we want to collect info.
Sample: Part of population that we will study and collect info from.
Population vs sample:
o Population: we would like to know about these
o Sample: we have to work with these
Sample because too expensive or time consuming to use population.
Units: elements of sample from which we collect the info.
Variable: measured property of element of sample
Quantitative variable (continuous/discrete):
Height, weight at birth, yield (could be any number) (c)
Number of children in household (only whole numbers 1,2,3 etc.) (d)
Qualitative variable (nominal/ordinal):
Hair colour, province (no ranking possible) (n)
Grade of eggs (AA/A/B) (ranking is possible) (o)
Simple Random Sampling (SRS):
Units drawn at random from population.
Every sample has equal chance to be selected.
Undersampling: certain groups are excluded from sample.
Non-response: not participating, or not successfully contacted
Voluntary participation (survey): might result in particularly positive or negative answers.
Response bias: social desirability bis
Observational vs experimental:
Obs: observe without influencing it (e.g. study about smoking during pregnancy)
Ex: apply treatment to unit in order to observe reaction.
→Cause-effect relationship can only be concluded from an experimental study.
Qualitative variables table: also applicable to discrete variables with limited number of outcomes
Tutorial 2
y 1+ y 2+… yn
Mean: =
n
Median: Midpoint/value where 50% more and 50% less. 3 4 6 7 9 (NOT SENSITIVE FOR OUTLIERS)
Mean: 3 4 6 7 9 mean=5.8 (SENSITIVE FOR OUTLIERS)
, Standard deviation = s = √ variance
Variance: s2 = ¿ ¿ ¿ Variance = (standard deviation)2
Interquartile range (IQR) = Q3 – Q1
Q1: 1st quartile = 25th percentile = lower quartile
Q3: 3rd quartile = 75th percentile = upper quartile
NOT SENSITIVE FOR OUTLIERS
Percentiles:
th
P percentile: P% of observations are smaller and (100-p)% of observations are larger
Five number summary:
Sample minimum
Lower quartile
Median
Upper quartile
Sample maximum
Law of large numbers: the bigger the sample size the closer to the true mean.
Notation:
n = sample size
y = number of persons that consume more than 6 g salt per day (for example)
p = probability
Estimator for p: sampling proportion y/n
y/n = consistent estimator: larger the sample size, the closer y/n gets to unknown value of p
Random phenomena: phenomena that are (partially) determined by chance
Random variable: variable whose numeric result originates from random phenomenon
(discrete/continuous)
P(1) + P(2) + … + P(n) = P(S) = 1!!! (S = set of all possible outcomes)
P (event A) = (number of outcomes in A) / (total number of outcomes)
Statistical events:
Complement of event A consists of all outcomes that are not occurring
in A
Event consists of the outcomes that occur either in A or only in B or A
and B simultaneously.
Tutorial 1
Research question: What we want to answer.
Population: Every member of group for which we want to collect info.
Sample: Part of population that we will study and collect info from.
Population vs sample:
o Population: we would like to know about these
o Sample: we have to work with these
Sample because too expensive or time consuming to use population.
Units: elements of sample from which we collect the info.
Variable: measured property of element of sample
Quantitative variable (continuous/discrete):
Height, weight at birth, yield (could be any number) (c)
Number of children in household (only whole numbers 1,2,3 etc.) (d)
Qualitative variable (nominal/ordinal):
Hair colour, province (no ranking possible) (n)
Grade of eggs (AA/A/B) (ranking is possible) (o)
Simple Random Sampling (SRS):
Units drawn at random from population.
Every sample has equal chance to be selected.
Undersampling: certain groups are excluded from sample.
Non-response: not participating, or not successfully contacted
Voluntary participation (survey): might result in particularly positive or negative answers.
Response bias: social desirability bis
Observational vs experimental:
Obs: observe without influencing it (e.g. study about smoking during pregnancy)
Ex: apply treatment to unit in order to observe reaction.
→Cause-effect relationship can only be concluded from an experimental study.
Qualitative variables table: also applicable to discrete variables with limited number of outcomes
Tutorial 2
y 1+ y 2+… yn
Mean: =
n
Median: Midpoint/value where 50% more and 50% less. 3 4 6 7 9 (NOT SENSITIVE FOR OUTLIERS)
Mean: 3 4 6 7 9 mean=5.8 (SENSITIVE FOR OUTLIERS)
, Standard deviation = s = √ variance
Variance: s2 = ¿ ¿ ¿ Variance = (standard deviation)2
Interquartile range (IQR) = Q3 – Q1
Q1: 1st quartile = 25th percentile = lower quartile
Q3: 3rd quartile = 75th percentile = upper quartile
NOT SENSITIVE FOR OUTLIERS
Percentiles:
th
P percentile: P% of observations are smaller and (100-p)% of observations are larger
Five number summary:
Sample minimum
Lower quartile
Median
Upper quartile
Sample maximum
Law of large numbers: the bigger the sample size the closer to the true mean.
Notation:
n = sample size
y = number of persons that consume more than 6 g salt per day (for example)
p = probability
Estimator for p: sampling proportion y/n
y/n = consistent estimator: larger the sample size, the closer y/n gets to unknown value of p
Random phenomena: phenomena that are (partially) determined by chance
Random variable: variable whose numeric result originates from random phenomenon
(discrete/continuous)
P(1) + P(2) + … + P(n) = P(S) = 1!!! (S = set of all possible outcomes)
P (event A) = (number of outcomes in A) / (total number of outcomes)
Statistical events:
Complement of event A consists of all outcomes that are not occurring
in A
Event consists of the outcomes that occur either in A or only in B or A
and B simultaneously.