Advanced Research Methods (2021)
Lecture 1
Part 1: Epidemiology and quantitative methods
Causal inference
I)
What is statistical adjustment in general?
It is meant to correct for improprieties or limitations in observed data, to remove the influence of nuisance
variables or to turn observed correlations into causal inferences.
What methods do I know?
II)
Causal language: A leads to B
Example of problems in establishing causality:
Small sample: only problem when the outcome is heterogenous and less easy to
define
Funding: protect independence so it doesn’t have to be fatal
No control group: essential omission.
o Potential regression towards the mean
Causation: “in an individual, a treatment has a causal effect if the outcome under treatment 1
would be different from the outcome under treatment 2.”
Individual causal effect: difference in outcome between treatment 1 and 2
Not possibly to observe both treatments in same individual
Causal inference as a missing data problem
If three identifiability conditions apply, then it becomes possible to estimate average causal
effects in a sample. This always answers the same question: do we know what would have
happened if the exposure was different?
Positivity
Consistency
Exchangeability
Association found in data is an unbiased estimate of causal effect if these three are met.
Positivity: positive probability of being assigned to each of the treatment levels
Units are assigned to all relevant treatment levels
Within levels of adjustment factors (example: smoking status)
,Consistency: clear definition of treatments
Example: very specific definition of exposure is necessary (Does water kill?)
If the concept is consistent depends on the question you are trying to answer
Exchangeability: treatment groups are exchangeable
It does not matter who gets treatment A and who gets treatment B
Potential outcomes are independent of the treatment that was actually received
Are they the same without the treatment conditions?
Stratification is when you divide the sample up in different groups according to the value of
one variable.
Example:
o Association: people with cigagettes lighters less likely to be healthy
o No exchangeability
Adjust for smoking status: groups are exchangeable with regards to
smoking status
o Believe that all three conditions are met; unbiased estimate.
RCT
automatically meet three identifiability conditions, but leaves some problems:
Limited generalizability – treatment protocol and patient selection
Practical and ethical considerations
Observational studies (all studies without randomization)
Real world outcomes but:
Availability of data
Internal validity threatened by lack of exchangeability
Explicit attention for positivity and consistency needed
Exercise:
Question
- Does true match minerals reduce most visible imperfections on oily skin?
- Does usage of the true match minerals improve quality of skin?
Actually estimated:
- Does using true match minerals reduce the perception of imperfections in women
after usage of one month?
- No, it is not.
- The estimate is?
- Conclusion justified?
Explain potential outcomes approach
Define causal effects
Apply consistency, positivity and exchangeability
Association does not equal causation
Too easy: If identifiability conditions apply than we can draw causal conclusions. We cannot
be certain, we have to be transparent about our assumptions.
Association: statistical relationship
Causation: difference between potential outcomes
,Association = difference if identifiability conditions hold
In order to see association as a valid and unbiased estimate of a causal effect we need theory
and idea of causal structure.
DAG: helps to see causal structure
Adjustment is used to improve exchangeability. For example using stratification, matching,
weighting, regression analysis. Complete and correct adjustment leads to exchangeability.
Selection strategies for adjustment (but don’t use them)
- Correlation matrix
- Stepwise backward selection
- Adjust for confounders
o Confounders: associated with the exposure, conditionally associated with the
outcome given the exposure, not in the causal pathway between exposure and
outcome
Problem with these strategies:
- Rely on available observed data rather than any theory/subject knowledge, so
important variables may be missed
- Strategy may increase bias rather than reduce it
- Step-wise methods lead to underestimation of statistical uncertainty
Design analysis based on an assumed causal structure Directed Acyclic Graphs
Graphical representation of underlying causal structures, a priori causal knowledge
Directed: each connection is an arrow, each arrow represents a potential causal effect,
certainly no causal effect means no arrow
Acyclic: a path cannot come back to itself
Path: route between exposure X and outcome Y. Path does not have to follow the direction of
the arrows.
Causal path: follows the direction of the arrows
Backdoor path: does not follow the direction of the arrows.
Closed/blocked path: path where arrows collide somewhere along the path.
Blocking open paths: open path is blocked when we adjust for a variable along the path (part
of the association is removed).
Confounding: bias caused by common cause of exposure and outcome (open backdoor path)
Confounder: variable that can be used to remove confounding
- Adjustment in DAG can be done by blocking any confounder along the path
Colliders: blocks a path, always a backdoor path.
- Don’t adjust for colliders, you don’t want to open the backdoor path.
- Colliders do not necessarily happen after the exposure and the outcome (it can also
occur before)
- Collider bias
Hernan: Chapter 1: A definition of causal effect
, Individual causal effects are defined as a contrast of the values of counterfactual outcomes,
but only one of those outcomes is observed for each individual– the one corresponding to the
treatment value actually experienced by the individual.
Aggregated causal effect is the average causal effect in a population of individuals. To define
it, we need three pieces of information: an outcome of interest, the actions = 1 and = 0 to
be compared, and a well-defined population of individuals whose outcomes =0 and =1
(read (outcome) under treatment = 1/0) are to be compared.
Our definition of a counterfactual outcome implicitly assumes that an individual’s
counterfactual outcome under treatment value does not depend on other individuals’
treatment values.
Formal definition of the average causal effect in the population: An average causal effect of
treatment on outcome is present if Pr[=1 = 1] ≠ Pr[=0 = 1] in the population of
interest.
- When, like here, the average causal effect in the population is null, we say that the
null hypothesis of no average causal effect is true.
- When there is no causal effect for any individual in the population, we say that the
sharp causal null hypothesis is true. The sharp causal null hypothesis implies the null
hypothesis of no average effect.
Effect measures: measure the causal effect. The causal risk difference, risk ratio, and odds
ratio (and other summaries) are causal parameters that quantify the strength of the same
causal effect on different scales.
- The causal risk ratio (multiplicative scale) is used to compute how many times
treatment, relative to no treatment, increases the disease risk.
- The causal risk difference (additive scale) is used to compute the absolute number of
cases of the disease attributable to the treatment.
- The use of either the multiplicative or additive scale will depend on the goal of the
inference.
Consistent estimator: An estimator ô of o is consistent if, with probability approaching 1, the
difference ô – o approaches zero as the sample size increases towards infinity. The hat
indicates a sample proportion as estimator of the corresponding population.
Sources of random error:
Sampling variability
Nondeterministic counterfactuals
⊥⊥ to denote independence = This is the case when Pr[ = 1| = 1] = Pr[ = 1| = 0]
When Pr[ = 1| = 1] ≠ Pr[ = 1| = 0] treatment and outcome are dependent or
associated. The associational risk difference, risk ratio, and odds ratio (and other measures)
quantify the strength of the association when it exists. They measure the association on
different scales, and we refer to them as association measures.