Inferential statistics -> Refers to methods used to draw conclusions about a population based
on data coming from a sample.
WEEK 1
Module 0: Introduction
Cases, variables and levels of measurement. Basic statistics. Exploring data.
- Variables -> Characteristics of something or someone.
- Cases -> Something or someone.
Levels of measurement:
1. Categorical variables:
- Nominal -> A nominal variable is made up of various categories that differ from each
other. There is no order, this means that it’s not possible to argue that one category is
better or worse than another. An example will be nationality. No similar intervals between
the categories.
- Ordinal -> There is a difference between categories. There is an order. However, by
looking at the order you don’t know anything about the difference between the
categories. Example would be education level. No similar intervals between the
categories.
2. Quantitative variables:
- Interval -> We have different categories and we have order. However, here there are
similar intervals between the categories. An example is age.
- Ratio -> Similar to the interval level, but has a meaningful zero point. An example would
be height.
- Quantitative variables can also be distinguished in discrete and continuous variables. A
variable is discrete if its possible categories form a set of separate numbers. For
instance, the number of goals scored by a football player. A player can score 1 or 2
goals, but not 1.22 goals. A variable is continuous if the possible values of the variable
form an interval. There is an infinite region of values. An example would be height.
Module 1: Descriptive Statistics
1.1. Describing data
- Frequency table -> Shows how the values are distributed over the cases.
- Nominal/ordinal variables -> Pie chart or a bar graph.
- Interval/ratio variable -> Histogram.
1.2. Measures of central tendency
- Mode -> The value that occurs most frequently (the most common outcome). Often used
if a variable is measured on a nominal or ordinal level.
- Median -> The middle value of your observations when arranged from the smallest to
the largest.
- Mean -> The sum of all the values divided by the number of observations.
- Generally if the distribution of data is skewed to the left, the mean is less than the
median, which is often less than the mode. If the distribution of data is skewed to the
right, the mode is often less than the median, which is less than the mean.
, - If your variable is categorical -> Mode.
- If your variable is quantitative -> Median, mean. Go for the median if you have outliers or
if the distribution is skewed and if that’s not the case go for the mean.
1.3. Measures of variance
- To describe a distribution we need more than the measures of central tendency (mode,
median, mean).
- There are two measures of variability -> The range and the interquartile range.
- Range -> The difference between the highest and the lowest value. It is easy to
understand and simple to compute, but it doesn’t give a good impression of the
variability because it only takes into account the extreme values.
- Interquartile range -> It leaves out the extreme values and it basically divides the
distribution in four equal parts (25%). IQR = Q3 - Q1. Q2 = Median. The main advantage
of IQR is that it’s not affected by outliers.
- Box plot -> Graph to describe center, variability and outliers. It shows you the maximum
value that’s not an outlier, Q3, Q2, Q1, minimum value that’s not an outlier and the
outlier. The length of the box represents the IQR. The horizontal line inside the box is the
median (Q2).
- IQR = 75 percentile - 25 percentile.
- A huge advantage of the variance and the SD is that they take into account all the
values of a variable.
- Variance -> The larger the variance, the larger the variability. This means that the values
are spread out around the mean. A disadvantage is that the metric of the variance is the
metric of the variable under analysis squared. To avoid this problem we just take the
squared root of the variance. This is called the Standard Deviation.
- Standard deviation -> It can be seen as the average distance of an observation from
the mean.
- Z- score -> Number of standard deviations removed from the mean. If we recode
original scores into z- scores it’s called standardization. This means that we replace the
original scores by standard deviations from the mean. The advantage is that we can see
whether a specific score is relatively common or exceptional.
, Module 2: Associations between variables
2.1. Associations between categorical variables
- A contingency table helps you to investigate the relationship between two ordinal or
nominal variables. Always use percentages!
- The difference between a contingency table and a frequency table is that a frequency
table always concerns only one variable.
- For quantitative variables -> scatterplot.
2.2. Associations between continuous variables
- Pearson’s R -> Tells us the direction and strength of the linear relationship between two
quantitative variables. The size of R expresses how tightly the observations are
clustered around the imaginary best- fitting straight line through the cloud of the data
points. The number is always between -1 (perfect negative) and +1 (perfect positive). 0
means that there is no correlation at all. We have to standardize the variables before
calculating it!
- No linear relation -> No Pearson’s R.
2.3. The regression line
- The difference between the observed value of a variable Y and the predicted value of
this variable with a regression is the residual.
- For intercept, look at where line is on Y-axis when X=0
- For slope, look at how much the line changes for each step higher on X. If X increases
by 1, what happens with Y?
- One of the characteristics of a regression line is that the sum of the squared residuals is
as small as possible.
2.4. Applying correlation and regression
WEEK 2
Module 3: Reliability Analysis
3.1. Introduction
- Reliability -> Consistency of the measurement. If we repeat the measurement several
times the outcome should be the same.
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper gannkag. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €12,49. Je zit daarna nergens aan vast.