SUTVA (stable unit treatment value): treatment Simple difference means (naïve approach): All methods: identifiability assumptions
to any individual is not affecting the outcome of prima facie 𝑃𝐹𝐸 = 𝐸(𝑌!$ |𝑋! = 1) − 𝐸(𝑌!# |𝑋! = 0) Method-specific assumptions:
other individuals (𝐼𝐶𝐸 to 𝐴𝐶𝐸), and for all units, Equal to 𝐴𝐶𝐸 if randomised experiment
no versions of treatment levels leading to diff. a) Simple difference means: no confounding at all
potential outcomes (violated in multilevel) 1) Relation confounders 𝒁𝒊 , outcome 𝒀𝒊 b) ANCOVA and regression estimation: require
Regression and ANCOVA: from coefficients correct specification of the outcome model
SUTVA = no interference + consistency
𝜃 = 𝐸(𝑌! |𝑋! = 1, 𝑍! ) − 𝐸(𝑌! |𝑋! = 0, 𝑍! ) c) Matching, IPW, and stratification: require that
Consistency: observed outcome 𝑌! equals the
potential outcome with the treatment to the level If 𝑌! and covariates in 𝑍! linear relation use of 𝜋b! balances the confounder distribution;
that was observed; 𝑌! = 𝑌!" for 𝑋! = 𝑥! correct specification propensity score model
If the difference in mean response between 𝑋 = 0
Requires well defined treatment (and typically and 𝑋 = 1 do not vary with covariates d) Dual-modelling: require correct specification
one version/type of treatment) of outcome model / propensity score model
Extrapolation problem: if covariates do not
Common causal: single treatment, not another overlap sufficiently à propensity score matching 1) Conditional (in)dependence: find Markov
equivalence set and DAG skeleton and colliders if
Normality assumption: distribution of Regression estimation: splitting datasets
causes unrelated (immorality; unmarried+child)
(independent) sample means is normal
𝐴𝐶= 𝐸$ = >(𝑌?!$ − 𝑌?!# ) Assumptions: causal Markov; faithfulness; no
Unconfoundedness (exchangeability): the ! latent confounder, sufficiency, no selection bias
treatment is independent of the potential
Comparable: ANCOVA with interactions; 𝑌!$ PC: (A) original true causal graph (B) fully-
outcomes 𝑌!# , 𝑌!$ ⫫ 𝑋! (not people who took
and 𝑌!# , 𝐵𝑎𝑠𝑒𝑙𝑖𝑛𝑒 × 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡, nonlinear models connected undirected graph (C) remove 𝑋 − 𝑌
aspirin because severe headache)
Standard errors by regression errors do not edge because 𝑋 ⫫ 𝑌 (D) remove 𝑋 − 𝑊 and 𝑌 − 𝑊
Lifted by RCT: individuals receiving treatment edges because 𝑋 ⫫ 𝑊|𝑍 and 𝑌 ⫫ 𝑊|𝑍. (E) finding
depend on variances of potential outcomes
(𝑋 = 1) are exchangeable (with respect to v-structures (F) orientation propagation
potential outcome) with those who do not receive 2) Relation confounders 𝒁𝒊 , treatment 𝑿𝒊
treatment (𝑋 = 0) Propensity scores: replacing covariates with 𝜋!
No unobserved confounding (conditional 𝑌!# , 𝑌!$ ⫫ 𝑋! |𝜋!
exchangeability): 𝑌!# , 𝑌!$ ⫫ 𝑋! |𝑍! , MAR no MCAR expS𝑋!& 𝛾U
Sufficiency: no unobserved common cause 𝜋! = 𝑃[𝑋$ = 1|𝑍! ] =
1 + exp(𝑋!& 𝛾)
Positivity (experimental treatment à Validate 𝑍! ⫫ 𝑋! |𝜋! to mimic RCT FCI (fast causal inference): variety PC, tolerates
assignment): exposed and unexposed and sometimes discovers unknown confounding
participants present at all combinations of 1) Reduce selection bias arising from non- variables (some directions remain unclear)
covariates of observed confounders in population random treatment assignment
GES (greedy equivalent search): start with empty
(in RCT present by design) 2) Useful for causal inference because they graph, and adds currently needed edges, and then
Strong ignorability = positivity + exchangeability balance the distributions (log (𝜋! )) of covariates eliminates unnecessary edges in some pattern
à non-overlap violate positivity à extrapolation
Identifiability: exchangeability, conditional 2) Restricting causal models: assume type
positivity, consistency (/SUTVA) 3) Include insignificant predictors in propensity relation (non-linear) or noise (non-gaussian)
modelling (to reduce Type 1 errors):
Randomised experiment (RCT): (no FCM (functional causal models): continuous
backdoor) 1) exchangeability 2) conditional 3a) Include variables (scientifically) predicting 𝑋! variables: effect 𝑌 as function of direct causes 𝑋
exchangeability 3) covariates 𝑋! are measured and 𝑌! and any 𝑝 > 0.10 ∨ .015 and unmeasurable factor/noise: 𝑌 = 𝑓(𝑋, 𝜀, 𝜃1)
after treatment and influenced by treatment 3b) Amount predictors and sample size must Assumptions: sufficiency, and transformation
Quasi experiment: random assigned and self- balance (dimensionality), leads to non-overlap (𝑋, 𝜀) to (𝑋, 𝑌) is invertible, recover 𝑁 uniquely
chosen treatment à compare research design Matching (of propensity scores): works best from observed variables 𝑋, 𝑌 (causal asymmetry)
Modularity, localised intervention: variable when one of the two groups (typically the control)
LiNGAM (linear, non-Gaussian model): 𝜀 ⫫ 𝑋
𝑝(𝑋) intervention (do-operator) is not changing is substantially larger, typically 1:1 matching and
causal direction assumed 𝑌 = 𝑏𝑋 + 𝜀, swapping
relation to other variables 𝑝(𝑌|𝑋) à forcing compare with unpaired t-test
coordinates, error term “flips 45°”, sufficiency
treatment same effect incidental treatment Balancing property: mimic RCT
3) Invariance, data different environment
𝑃(𝑍! |𝜋! = 𝑐! , 𝑋! = 1) = 𝑃(𝑍! |𝜋! = 𝑐! , 𝑋! = 0)
Invariant Causal Prediction: search for
If smaller group is treated persons, then an invariant models by normal and do-intervention
estimate represents 𝐴𝐶𝐸$ , vice versa 𝐴𝐶𝐸# to identify direct causes (means, conditional
If no matches found for some treated persons, dependencies which are invariant do not change)
causal effect of subpopulation Assumption: need different environments and
Inverse propensity weighting (IPW): modularity and localised interventions, and:
∑! 𝑋! 𝑌! /𝜋b! ∑!(1 − 𝑋! )𝑌! /(1 − 𝜋b! ) e.g., 𝑋$ , 𝑋) → 𝑌 is invariant under interventions
𝐴𝐶= 𝐸 = −
∑! 𝑋! /𝜋b! ∑!(1 − 𝑋! )/(1 − 𝜋b! )
Pseudo population: correcting for over/under
representation matching 𝜋! for treated with 1/𝜋! ;
and 1 − 𝜋! for not treated with 1/(1 − 𝜋! )
Posttreatment selection bias: covariates
𝜋! = 0.5 for each observation and thus 𝑍! ⊥⊥ 𝑋! , measured after the treatment should not be
double size new sample, outlier sensitive regarded as confounders because of time order
(Common cause) confounding bias: failure
, to condition on a common cause (fork) of
Markov Condition: every variable, 𝑋, in a treatment and outcome
directed acyclic graph, is independent of its non- Unconfoundedness bias: potential outcomes
descendants conditional on its parents (the should be independent from the treatment (if
variables with edges directed into 𝑋) Subclass- / block- / stratification: estimate necessary, conditional) because otherwise a third
Causal Markov: when the Markov condition is 𝐴𝐶𝐸 from multiple (5-10) strata, take average variable might be confounding the effect
assumed to hold for a causal graph and its 𝑁' Endogenous selection bias from sampling:
associated population distribution 𝐴𝐶= 𝐸 = > 𝜃?' collider bias (spurious associations) resulting
𝑁
! from the sampling procedure, and not from, e.g.,
Global Markov condition: d-seperated if two
variables 𝑋 ⫫ Y|S (DAG à statistics) Strata should be narrow: so, covariates do not the inclusion of inadequate covariates, two types:
matter within to make difference, mimic RCT 1) Nonresponse bias: analyse only completed
Faithfulness: if two variables statistically
independent (through conditioning) then that is 3) Dual modelling (doubly-robust) questionnaires, and variables of interest
d-separation 𝑋 ⫫ Y|S (statistics à DAG) Regression estimation propensity-related associate with survey completion (MNAR)
covariates: dummy treatment variable made 2) Attrition bias: over time (longitudinal),
Violated if confounder cancels out direct effect from propensity scores (post hoc correction)
(-0.25 and 0.50*0.50) à unobserving mediator respondents inevitably drop out of study, and this
becomes collider in causal discovery Standardised mean difference attrition is likely selective, the remaining is
(𝑍̅|𝑋 = 1) − (𝑍̅|𝑋 = 0) different from whom dropped out
Causal Faithfulness: when no clear d- ∆𝑍 =
separation but “faithful” to the graph 1 Non-representative samples are problematic
g S(𝑆 ( |𝑋 = 1) + (𝑆 ( |𝑋 = 1)U
2 Endogenous selection (collider) bias: from
Common trend, equality
condition (time-invariant If ∆𝑍 > 0.10: meaningful imbalance covariates: conditioning on collider (or descendant) which is
confounding): 𝛼 assumed non- after fix not < 0.10? delete matches/large weights on a noncausal path linking treatment and
zero, confounding bias cancels outcome (more general than sample bias)
If ∆𝑍 > 0.30: linear extrapolation is problematic:
out if unobserved 𝐴 affects the use interactions, transformations of 𝑍, kernel Overcontrol bias: this type of bias results from
pretest 𝑃 and the posttest 𝑌 to regression, local linear, propensity scores conditioning on a variable on a causal path
the same extent, 𝛽1 = 𝛽2 between treatment and outcome
Note wrong covariate is harmful (collider)