Unit 12 – Causality and bivariate causal hypotheses
Causality: a relationship between the independent (X) and dependent (Y) variable. One variable explains or
causes the other.
Correlation is between variables, causation is the effect of X on Y.
(Causal) hypothesis: be clear about the units of analysis, the independent and dependent variable, the type of
these variables (dichotomous, nominal, ordinal, interval or ratio) and the direction of the sign of the
relationship (positive/negative).
Exogenous concept Endogenous concept
Cause Effect/consequence
X-variable Y-variable
Independent variable Dependent variable
Treatment Observation
Three aspects of causality (if we make a causal statement about why things happen, we have to test three
things):
à Time order: X before Y, independent precedes the dependent in time. Measuring both values at the same
time, may occur problems, called reverse causation. Example: do you like him because you are dancing
together, or are you dancing together because you like him? To check time order, collect data at different
points in time. Pre-test (dep) à pre-test à treatment (indep) à post test à post test
à Association/correlation: X and Y are correlated (a change in one variable means a change in the other). If
there is no associaton, there is no causality.
à No influence of a third variable (non-spuriousness). There is no third variable accounting for the association
(theorizing, hypothesis, testing). It may be that if there is a third variable, and you take this into account, the
original relationship disappears.
Causal hypothesis/mechanism = a hypothesis about the causal relationship between two variables,
independent and dependent à explanatory.
Variates
à Univariate: one variable
à Bivariate: two variables, independent and dependent
à Trivariate: three variables, independent, dependent and a third variable.
A hypothesis is TESTED, never PROVED. Based on the test, we may or may not reject a hypothesis. Hypothesis
can ONLY be tested if they are precise (= being specific about the meaning of words). The more precise, the
more easily it will be rejected.
Deterministic relationship: exact mathematical relationship or dependence between variables.
The only way to test/study hypothesis/causal relationships, is by finding variation (= studying causality)
A relationship can either be deterministic or probabilistic
à Deterministic: if “this happens” then that “always happens”. ONLY if the expected relationship is
deterministic, we can reject this expectation with a single observation. Is a linear line in a graph.
à Probabilistic: if “this happens” then relatively more/less often “that happens”. ALL expected relationships in
social sciences are probabilistic, NEVER deterministic. A probabilistic relationship will be shown in a graph, not
a line but a correlation in a scatterplot, only for ratio and interval measurement.
It is shown in a graph with dots, because it is what we expect:
o There could be measurement error, things are a bit different than we might expect.
o We use parsimonious models (simple models) because we want a simple picture, we leave
variables out that we know are affecting our consequence. That is why the dots are not always on
the line.
A probabilistic relationship can also be shown in a contingency table, with two dichotomous, ordinal or nominal
variables
,If the causal statement is correct, the cause (= independent variable), precedes the consequence (= dependent
variable), the cause is associated with the consequence and there is no third variable producing the observed
relationship (= non-spurious).
In a hypothesis, you need to be clear about:
1. Units of analysis;
2. The dependent and independent variables;
3. The type of these variables (dich, nom, ord, int, ratio);
4. The direction or sign of the relationship (the type of relationship).
What do we expect to find if a hypothesis is true?
à Implication 1: If true, we expect variables to be associated/correlated.
Analyze the data, simplifying from scale to dichotomous.
à Implication 2: Correct time order
Two questions asked:
- Did you pass your last exam? (>5,5 or not)
- Did you study hard or not?
When the student already received the grade, the second question can be answered differently than when
they did not receive their grade yet.
à Implication 3: Non-spuriousness (third variable)
The moment you take the third variable into account, the original relation between the two disappears.
Nominal/ordinal Interval/ratio
Pie chart X
Bar chart/graph X
Dot plot X
Histogram X
Boxplot X
Scatterplot X
, Unit 15 – Research design for testing causal hypothesis
Research design = the way of answering an explanatory (causal) research question in a convincing way. It is
logic, more about thinking than organizing (logistic).
O = observation/outcome/dependent variable
X = treatment/independent variable
X = removal of the treatment
R = group created by random assignment
N = comparison group NOT created by random assignment
Cross sectional Interrupted time series (Classical) experiment
Association Yes Yes Yes
Time order No Yes Yes
Non-spuriousness No No Yes
Three groups of research designs for testing causal hypothesis:
à Cross-sectional research: a research design in which all variables of a set of units are measured at the same
time and none of the variables is manipulated differently for a sub-set of units.
O
Association/time order/non-spuriousness:
• Association: easily checked.
• Time order: reverse causation cannot be ruled out because data is collected at one moment in time.
Dependent and independent variables can be mixed up.
• Non-spuriousness: third variables may affect the relationship, confounding is a problem. We cannot easily
exclude the effect of third variables. We can measure/take into account the possible effect of third
variables, what may reduce the problem of spuriousness, but we can never rule it out completely.
Weak internal validity (time order “reversed causation” and third variables)
Potentially strong in external validity (sampling)
à Interrupted time series: a research design in which a
(dependent) variable of one group of units is studied over
time and in which at one point in time the group receives a
treatment (a change in the independent variable.
OOXOO
Type of longitudinal research (time is included).
Association/time order/non-spuriousness:
• Association: easily checked.
• Time order: because of pre-test, treatment and post-test,
time is included, so you can easily check the time order.
• Non-spuriousness: this type of research does not allow you to test the influence of a third variable, what
might explain the relationship.