Lecture 1
What is the difference of probability and statistics?
Probability - This often applies deduction. We know the distribution of a certain variable,
and you try to find out how likely a certain outcome is (i.e., general → specific).
Statistics - This often applies induction. We take a sample outcome, and want to know what
we can say about the population with that outcome (probability), (i.e., specific → general)
What is the hypergeometric distribution?
A discrete distribution that models the number of events in a fixed sample size when you
know the total number of items in the population that the sample is from. Each item in the
sample has two possible outcomes
What are similarities between probability and statistics?
- The random nature
- You can apply statistical techniques to populations (as in probability) (e..g, compute
mean, SD)
- Some statistical techniques first make assumptions about the population, before
determining how (un)likely the assumptions are based on the sample (e.g., inferential
statistics)
What do you do with inferential statistics? What happens when this is very unlikely?
Let's assume what the state of the world is (e.g., variables, distribution between groups etc).
When we draw a sample, what is the likelihood that we observe a difference / correlation that
is as big as the one that I assumed. Then we have the decision rules to falsify (i.e., to reject),
the null hypothesis.
What is descriptive statistics?
The summary of your data.
What is inferential statistics? Do you use deduction or induction?
You use the sample, based on the sample alone, you draw inferences of the population. You
use induction
With which kind of statistics is there more certainty?
Descriptive statistics, compared to inferential statistics.
What can help when using inferential statistics for the uncertainty?
Working with margin of errors.
What is a parameter and a statistic?
Parameter - The observation of characteristics in a population. (e.g., average time of
self-study per week.
Statistic - The observation of characteristics in a sample. (e.g., average time of self-study
per week.
,What is reliability in statistics?
Consistent results over time?
What is validity in statistics?
If the sample is representative for the population.
What can you calculate when your variable is on a qualitative scale?
The mode (the category that occurs the most frequent)
What are the scales for categorical/qualitative? What are examples?
- Nominal (order does not matter) (e.g., eye colour)
- Ordinal (order does matter) (e.g., education level)
What are the scales for numerical/quantitative? What are examples?
- Interval (equal distance) (e.g., Celsius)
- Ratio (equal distance + zero) (e.g., fahrenheit)
What are the two types of range?
Discrete data - Measure unit is indivisible. (e.g., #siblings)
Continuous data - Unit infinitely divisible (e.g., body height)
What kind of data is preferred in statistics?
Continuous data.
What is the sampling error? What can we do with this error?
When you take small random samples, the result could have some variation. We can
calculate how big it is and this gives you the information to which extent your results are
reliable.
What can be problems for certainty with inferential statistics?
- Small sample size
- not the representative samples.
What is the family bias (sampling bias)?
Because of the sampling method, the sample does not represent the population where you
are interested in.
What is the response bias?
People give socially desirable answers as an example. The answers that you observed in
the sample are not representative of the population.
What is non-response bias?
People simply refuse to answer some of the questions. (e.g., income)
What is the solution to all the errors and biases?
Use a random (or other probability) sampling approach of sufficient size that generates data
for everyone approached, with correct responses on all items for all subjects.
,Tutorial
How much will your reliability increase when you increase the sample 4 times?
By 2 times.
What wordt are you not supposed to say if you are talking about confidence intervals?
If you are talking about confidence intervals, you are not supposed to say the word ‘chance’.
, We only use the nominal, ordinal and scale measurements in SPSS (scale = ratio & interval)
What is the empirical rule?
A rule that says that almost all observations within a distribution fall within the interval with
the limits: the mean minus three times the standard deviation and the mean plus three times
the standard deviation.
How to describe the sample?