Harnan Et Al. (2019) – A Second Chance to Get Causal
Inference Right: A Classification of Data Science Tasks
Statement: statistics may be applied to make causal inferences when using data from
randomized experiments, but not when using nonexperimental (observational) data
Simpson’s paradox: failure to recognize that the choice of data analysis depends on the
causal structure of the problem
1. A Classification of Data Science Tasks
The scientific contributions of data science can be organized into three classes of tasks:
- Description: using data to provide quantitative summary of features of the world
o Elementary calculations (mean / proportion)
o Unsupervised learning algorithms (cluster analysis)
o “Clever” data visualisations (storytelling)
- Prediction: using data to map some features of the world (the inputs) to other
features of the world (the outputs); double to hundreds of variables
o Elementary calculations: e.g., correlation coefficient, risk difference
o Supervised learning algorithms: random forests, neural networks
- Counterfactual prediction: using data to predict certain features of the world as if
the world had been different, as is required in causal inference applications
o Elementary calculations by randomised experiments and perfect adherence
o Complex implementations like g-methods
Statistical inference (explanation; confirmatory) is often required for all three tasks
Sciences are primarily defined by their questions rather than by their tools
1
, 2. Prediction vs. Causal Inference
Predictive (non-causal) applications of data science: map inputs to outputs
- but do not consider how the world would look like under different courses of action
Mapping observed inputs to outputs is for automated data analysis because only requires:
- Large data set with inputs and outputs
- Algorithm that establishes a mapping between inputs and outputs
- Metric to assess the performance of the mapping, often based on a gold standard
Prediction tasks require expert knowledge to specify the scientific question:
- What inputs and what outputs
- Identify / generate relevant data sources
However, no expert knowledge is required for prediction after the inputs and outputs are
specified and measured in a particular dataset (machine learning can take over here)
Causal inference by expert knowledge to create meaning to prediction (causal structure)
Confounding factor: underlying mechanism within observations and features
Model paradox: if a variable takes over the effect of another variable (depending if left out)
Counterfact: potential outcome which is not observed
Naïve conclusion: assuming causal relations from predictions
3. Implications for Decision-Making
Predictive algorithms inform us decisions have to be made, but they cannot help us make the
decisions, also predictive algorithms do not depict actual causality
- Causal analysis needed to answer “what if” questions, and avoid agnostic features
Distinction between prediction and causal inference (counterfactual prediction) negligible for
decision-making when relevant expert knowledge is codifiable into algorithms
- Complex systems (too chaotic for long term prediction)
o unknown and nondeterministic governing laws (“rules of the game”)
o Uncertainty about necessary data are available
o Learning by trial and error (experimenting) is impossible
A complex system must be understood by qualitative (model) knowledge for causalities
- Extremely complex systems require narrow research questions and modest analysis
o Not explaining causal structure of entire system or globally optimal decisions
2
, 4. Process and Implications for Teaching
Accuracy of causal answers cannot be quantified using observational data
Data scientists without subject-matter knowledge cannot conduct causal analyses in isolation:
- They don’t know how to articulate the questions (what the target experiment is)
- They don’t know how to answer them (how to emulate the target experiment)
5. Conclusion
Data science that embraces causal inference must
- Develop methods for integration of sophisticated analytics with expert causal expertise
- Acknowledge (unlike prediction) assessment of the validity of causal inferences cannot
be exclusively data-driven because validity of causal inferences also depends on the
adequacy of expert causal knowledge
Causal directed acyclic graphs: represent different sets of causal structures compatible
with existing causal knowledge explore impact of causal uncertainty on effect estimates
Intelligence is the ability to predict counterfactually how the world would change under
different actions by integrating expert knowledge and mapping algorithms
- No AI will be worthy of the name without causal inference
3
, Holland (1986) – Statistics and Causal Inference
1. Introduction
Randomised experiments: statistical procedure with ability to identify causation
Purpose of the paper is to show, firstly, how statistics is useful for causal inference, and
secondly, the difference between causal and associational inference
2. Model for Associational Inference
The joint distribution of 𝑌 and 𝐴 over 𝑈 is specified by 𝑃𝑟( 𝑌 = 𝑦, 𝐴 = 𝑎) = proportion of 𝑢
in 𝑈 for which 𝑌(𝑢) = 𝑦 and 𝐴(𝑢) = 𝑎
- Associational parameters are determined by this joint distribution
𝑃𝑟(𝑌=𝑦, 𝐴=𝑎)
conditional distribution of 𝑌 given 𝐴 is specified by 𝑃𝑟(𝑌 = 𝑦|𝐴 = 𝑎) = 𝑃𝑟(𝐴=𝑎)
- Conditional distribution describes how distribution 𝑌 values change over 𝑈 as 𝐴 varies
- Associational parameter: regression of 𝑌 on 𝐴, conditional expectation 𝐸(𝑌|𝐴 = 𝑎)
3. Rubin’s Model for Causal Inference
o "A causes B" almost always means that A causes B relative to some other cause
that includes the condition "not A"
- Treatment 𝑆 = 𝑡 → 𝑌𝑡 (𝑈) (one cause) versus control 𝑆 = 𝑐 → 𝑌𝑐 (𝑈) (another cause)
o Controlled study: 𝑆 is constructed by the experimenter
o Uncontrolled study: 𝑆 is determined by factors beyond experimenter control
- Either case, the critical feature of the notion of cause in this model is that the value of
𝑆(𝑢) for each unit could have been different
Role of time now becomes important because a unit is exposed to a cause at some specific time
or within a specific time period variables now divide into two classes:
- Pre-exposure: those whose values are determined prior to exposure to the cause
- Post-exposure: those whose values are determined after exposure to the cause
Treatment 𝑡 causes the effect 𝑌𝑡 (𝑈) − 𝑌𝑐 (𝑈) on unit 𝑈, relative to treatment 𝑐
Fundamental problem of causal inference: it is impossible to observe the value of 𝑌𝑡 (𝑈)
and 𝑌𝑐 (𝑈) on the same unit and, therefore, it is impossible to observe the effect of 𝑡 on 𝑢
- Scientific solution: exploit various homogeneity or invariance assumptions
- Statistical solution: average causal effect 𝑇 of 𝑡 (relative to 𝑐) over 𝑈 is the expected
value of the difference 𝑌𝑡 (𝑈) − 𝑌𝑐 (𝑈) over the 𝑢’s in 𝑈: 𝐸(𝑌𝑡 − 𝑌𝑐 ) = 𝑇 = 𝐸(𝑌𝑡 ) − 𝐸(𝑌𝑐 )
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Samme. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.93. You're not tied to anything after your purchase.