Measurement levels: Norminal data (exhaustive & mutually exclusive), Ordinal data (logic, concrete order),
Interval, Ratio (relative comarisons)
You can go from ratio to ordinal, but not the other way around
Measurement level is a property of the measurement values, it’s not an intrinsic property of the thing you
measure
Graph types: Bar charts (nominal & ordinal), Histogram (scale) & Scatterplots (scale)
Tables: Frequency table (1 variable) & Crosstable (2 variables)
The mean is no very robust/strong against outliers
The normal distribution: Symmetrical distribution, total area under curve is equal to 1
Central tendencies: Left skew, normal distribution, Right skep
Deviation score (D = X – M): The difference between each score X and M(ean) score. Is score of grams, so not
confuse it with Z-score. Total and average of deviations score is always 0.
Variance (S2): smaller values indicate less variation, people score closer to the mean (not yet a measure of
average variation)
Standard deviation (S): Smaller values indicate less variation, people score closer to the mean
Z-score (Z): the deviation score, but standardized by the variability of score. So not affected by the unit of scale.
Outcome: The number of standard deviations someone differs from the mean
You can calculate the percentage of people with particular score, and the propbability of observing
particular scores in your sample.
Step 1: Compute the Z-score of X (eg., height of 212 cm)
Step 2: Find the probability that a Z-score is larger than 1,6 (= Z)
P(X > Z > X) daarbuiten & P(X<Z<X) daarbinnen
Standard normal distribution (Z-distribution): Mean (µ) = 0. The average devation of the scores = 0. Standard
deviation (σ) = 1.
Empirical rules for normal distribution:
68% of the cases can be found within ± one SD from the mean. 95% of the cases can be found within ± two SDs
from the mean. 99,7% of the cases can be found within ± three SDs from the mean
With sampling distribution of the mean (population) use the standard error (and not SD). For example 95%
of the sample means can be found within 2 standard errors from the mean (CI)
Parameters Characteristics of populations
Statistics Sample estimates of the population characteristics
Descriptive statistics are limited to summarizing what’s directly observed. You don’t draw conclusions yet.
Inferential statistics are used to infer something about X in the whole population.
Two issues: Sampling bias (the sample is not representative of the population). Sampling fluctuations
(statistics have different estimates in different samples)
Sampling error = differences between sample value (e.g, mean) and population value
The sampling distribution of the mean = the distribution of the mean values across a large number of sample
of the same size from the same population.
(Population) Standard error (σM) = the standard deviation of a sampling distribution
Even if X is not normally distributed, the sampling distribution of the mean still approximates a normal
distribution if the sample size is large enough (say N > 30). (Central limit theorem?)
T-distribution: You can only use the formula of standard error of the mean if you know the population standard
deviation, but you rarely know this in practice. So, our best guess at σ is the sample estimate S. But, errors can
arise when we estimate σ using S.