Statistics
LECTURE 1: BACKGROUND RECAP
Quantitative methods: testing theories using numbers: describing and analysing data using statistics
Qualitative methods: testing theories using language (newspapers, conversations, interviews)
We use statistics so that we can generalise our findings from the sample.
How big is the chance that the results you found for your sample also applies to your population?
Representative groups are important
Statistics is only part of the research process (it will not help if your study is badly designed).
Set up your study properly:
1. Find something that needs explaining
2. Based on your theory you have certain expectations
3. Formulate testable hypotheses
4. Testing the hypotheses
5. Evaluate the results
Research process: observation check theory set up hypotheses and predictions collect data
analyse the data (does it fit the hypotheses)
Theories: A hypothesized general principle or set of principles that explain known findings about a
topic and from which new hypotheses can be generated (don’t have to know this by heart)
Setting up a hypothesis: a prediction from the literature, theory or observations. Generally about the
expected relationship between variables.
A hypotheses needs to be falsified
Falsification: the act of disproving a theory or hypothesis. One falsification is stronger than an infinite
number of confirmations. To show that a hypothesis can be rejected.
In order to generate hypotheses and analyze data you need to define the variables that you’re
interested in.
Variable: Anything that can be measured and can differ (very) across entities or time. Research
describes and tries to understand/explain variability.
You are often interested in the relationship between theoretical constructs.
Operationalizing (make more concrete, manipulating the construct) theoretical constructs: variables.
Types of variables:
- Independent variable (IV)
the proposed cause, a predictor variable, influences the result, it’s the variable that you
manipulate in your experiment
- Dependent variable (DV)
Variable that shows the result or outcome, depending on the intervention, the proposed
effect, it is measured and not manipulated in an experiment.
Will different amounts of water (IV) have an effect on the size of the plant (DV) and number
of leaves(DV)?
1
, - Control variable: a variable that does not vary or is not explicitly manipulated.
- Moderator or mediating variable: a variable that influences the effect on the independent
variable.
EXAMPLE
A study using Dutch-accented and German-accented speakers of English to study the effect of type of
English accent on speaker appreciation.
IV: type of accent (Dutch-accented or Germen-accented English)
DV: speaker appreciation
Control variable: age if for example only participants who are 20 years old take part in the study
(because I think age could affect my result, and I don’t want it to)
Moderator: gender if for example the type of accent has an effect on speaker appreciation for
women, but not for men (so gender plays a role here)
Variables can represent different types of information:
- Categories/labels
- Categories that can be ordered
- Numbers with meaningful distances
- Numbers with a true/meaningful zero point
Levels of measurement (NOIR):
- Categorical: entities are divided into distinct categories
- Binary variable: there are only 2 categories, like dead or alive
- Nominal variable: there are more than 2 categories, like someone is an omnivore,
vegetarian, vegan or fruitarian. Weakest, you can’t compute a mean.
- Ordinal variable: same as the nominal variable but the categories have a logical order, like
whether people failed, passed or have a distinction in their exam.
- Continuous: entities get a distinct score
- Interval variable: equal intervals on the variable represent equal
differences in the property being measured like the difference
between 6 and 8 is equivalent to the difference between 13 and 15.
Likert scale is sometimes used.
- Ratio variable: the same as interval, but the ratios of scores on the
scale must make sense. So, there must be an absolute zero point.
Like someone earning 16 euros earns twice as much as someone
earning 8 euros, someone earning 0 euros earns nothing.
Measurement error: the discrepancy between the actual value we’re trying to measure and the
number we use to represent that value (you weigh 80 kg, but the scale says 83 kg so error=3kg). We
can keep the measurement error low by using valid and reliable measures.
Validity: whether an instrument measures what it’s supposed to measure.
- Criterion validity: the extent to which a test can predict actual behavior
- Predictive validity (like predict later level of high school diploma)
- Concurrent validity (correlation with previously validated instrument): refers to the degree
in which the scores on a measurement are related to other scores on other measurements
that have already been established as valid.
- Content validity: evidence that the content of a test/measure corresponds to the content of
the construct it was designed to cover. Do all the items fully cover the subject.
2
, - Ecological validity: evidence that the results of a study can be applied and allow inferences
to real-world conditions
Reliability: ability of the measure to produce the same results under the same condition.
- Internal/inter-item reliability
- Intercoder/-inter-annotator reliability
- Test-retest reliability
Can be measured (validity can’t)
Not the focus of the course
Research design
- Correlational research: observing what naturally goes on in the world without directly
interfering with it, like surveys on attitudes, opinions.
- Cross-sectional research: this term implies that data from people with different
characteristics (like different age) are collected at one point in time
- Experimental research: one or more variables are systematically manipulated to see their
effect on an outcome/dependent variable. Different possible levels of control and
randomization. Statements can be made about cause and effect.
Type of research depends on the research question
Correlation does not equal causation.
Cause and effect:
1. Cause and effect must occur close together in time (contiguity/correlation)
2. The cause must occur before an effect does
3. The effect should never occur without the presence of the cause
There should be no confounding variables: the tertium quid
A variable (that we may or may not have measured) other than the predictor variables that
potentially affects an outcome variable. Like the relationship between low self esteem and dating
anxiety is confounded by poor social skills.
Ruling out confounds
An effect should be present when the cause is present and that when the cause is absent the
effect should also be absent.
Control conditions: the cause is absent.
Experimental research design: methods of data collection
- Between-subjects/between-groups/independent: different entities in experimental conditions
- Within-subject/repeated measures: the same entities take part in all experimental conditions
Advantage: economical, more sensitive to defect effect. Disadvantage: practice effects, fatigue
Experimental research design: types of variation
- Systemic variation: differences in performance created by a specific experimental
manipulation.
- Unsystematic variation: difference in performance created by unknown factors (age, gender)
Randomization of order of conditions and random assignment of participants minimizes
unsystematic variation, especially in a between subject design.
3