Full summary in English of all lectures of the course Advanced Research Methods. The lectures are complemented by the most important aspects of the working groups and literature. I myself learned this summary and got a 7.9 for the exam.
Lecture 1 – Causal inference: drawing the lines between causes and effects (quantitative methods)
This course is about the function of theory in quantitative and and compare the two > try to isolate the impact of one
qualitative research. specific intervention.
> Not interested in the outcome per se, but in the role of the
Quantitative research treatment in achieving that outcome. When two outcomes
A reductionistic perspective to the object is used; specific differ and the only difference between two groups is the
relationships between variables are studied. The researcher is treatment, the treatment has a causal effect (causative or
detached. Research questions are more often closed questions preventive) on the outcome.
(prevalence of a phenomenon, associations etc.). The aim is to
test a hypothesis, to prove an assumption or causality. The Causation in an individual, a treatment has a causal effect if the
research process is more or less fixed; it is done in a controlled outcome under treatment 1 would be different from the
experiment with a fixed design. Data collection is about outcome under treatment 2 (𝒀𝒂=𝟏 𝒊 ≠ 𝒀𝒂=𝟐
𝒊 ).
numbers; structured observations, surveys, measurements etc. Individual treatment effect is the difference in effects for one
Data analysis is presented in tables and calculated. individual comparing the treatment and not having the
treatment when having both potential outcomes. Average
Causal inference treatment effect is calculated from the individual treatment
Drawing conclusions about causation or estimating causal effects in the population.
relationships. Statistical inference uses data to address For individuals, a causal inference cannot be observed as we
important questions; it tells what is likely and what is unlikely. only observe one potential outcome, the counterfactual (what
Statistical inference is the process by which the data speak to would have happened) cannot be observed > missing data
us, enabling us to draw meaningful conclusions. problem, observe only half of the outcomes > fundamental
problem in causal inference. The definition is therefore not
Terms used are from epidemiology. Epidemiology does not practically useful for individual causal effects.
represent a body of knowledge. It is a philosophy and
methodology that can be applied to a very broad range of health Average causal effects can be determined under three
problems. The art of epidemiology is knowing when and how to identifiability conditions to observe the counterfactual:
apply the various strategies creatively to answer specific health - Positivity – observe what would have happened if, so you
questions (how can we study what makes you ill) – may be need to observe patients being assigned to all relevant
applied to problems outside the healthcare sector as well treatments; have a control group. Within these treatment
groups patients should be included for each confounder
Common problems with causal effects (A leads to B) (for example smokers and non-smokers in treatment
- Small sample size – whether this is a problem depends on group and control group) > results for all treatment groups
the effect you are proving; when the effect is very strong a in all strata of the adjustment variable need to be available
small sample is enough (all study participants die after in order to make the analysis possible.
treatment), when the effect is small a larger sample is - Consistency – the treatments (exposure) have to be
desirable to have some certainty about what is going on. defined very clearly; clearly define the if. What is the
The data sample needs to be representative of a larger treatment and what is the without treatment > precise
group or population to have accurate estimates. Easiest to enough until no meaningful vagueness remains (need to
do this to randomly select a subset of the population specify the start and end of the intervention and the
(every individual should have the same chance to be in the implementation of its different components over time).
sample). This condition is crucial as causal effects can only be
- Study performed or financed by a commercial company – calculated for very specified situations and research
not a problem when agreements exist on what and how questions, but often overlooked.
the results are published - Exchangeability – the potential outcomes must be
- No control group – for causal estimates you need to know independent of the treatment that was received (𝑌𝑖𝑎 ⊥ A).
what happened with the treatment, and what would have It must not matter who of the two treatment groups get
happened without the treatment to predict what will treatment A and who get treatment B so that the
happen (regression to the mean problem; when having association can be ascribed to the treatment effect.
severe pain on day 1, it will probably be better on day 10). If the three conditions are met, the association of the exposure
You need the outcome of the treatment group and control and outcome is an unbiased estimate of causal effect.
group that are broadly similar (only differ in treatment) The three identifiability conditions are automatically met by
using Randomized Controlled Trials (RCT) (gold standard). As the
2
, selected patients are randomly assigned to the different o Stepwise backward selection – start with a regression
treatment groups and the treatment is clearly described. Often model that includes all the covariates and remove the
randomization cannot be used due to practical or ethical variable that is the least statistically significant and
considerations (smoking, having children). Besides, there is important, then run the model again etc. the
limited generalizability (external validity) due to treatment variables should be retained if removal leads to
protocol and different patients in the sample from outside the substantial change in the effect estimate as leaving it
sample. Observational studies have real world outcomes out leads to confounding.
(compared to those from the RCT in an artificial setting). o Adjust for confounders – what could be
However, the internal validity is threatened as exchangeability confounders? Confounders are associated with the
is not guaranteed (incomparability between two groups). exposure, conditionally associated with the outcome
Positivity and consistency need the explicit attention in given the exposure and not in the causal pathway
observational studies. Observational studies are commonly between exposure and outcome.
used when it is not possible to randomly divide the sample into By using those strategies you rely on the observed data
different groups (for example women with children and without rather than on a priori knowledge of causal structures and
children) therefore important variables may be missed increasing
the bias → Use DAGs to decide what to adjust for when
Causal graphs doing causal analyses.
In order to meet the exchangeability condition, statistical
adjustments can be made. Statistical adjustments remove the Directed Acyclic Graph is a graphical representation of an
effect of a variable by adjusting for it (including it in the underlying causal structure a priori (theory is important in
regression model etc.) making an estimation of the causal effect research). It is directed and has no cycles connecting the other
of the exposure on the outcome in the absence of confounding edges making it impossible to traverse the entire graph starting
effects possible (called conditioning for DAG). In case no effect at one edge. Always start with the exposure on the left, and the
is found in the adjusted association while it is in the unadjusted, outcome on the right determined by the causal question under
the effect runs via other variables. investigation. Confounders and colliders are placed between
- Stratification/selection – dividing members of the exposure and outcome variables. Everything in the DAG that is
population into homogeneous subgroups (strata) before connected by arrows is associated in the data. The association
sampling (only analyze smokers with and without a between the exposure and outcome variables is the
cigarette lighter). Unadjusted results may imply a combination of all open paths between those variables.
relationship that does not exist when stratifying the (Possible that a variable is confounder and collider, depending
groups. This is a only possibility when having a small on the different paths the variable is in).
number of factors (confounders) that are categorized. The - Directed – each connection between the variables is an
advantage is that it is easy and intuitively interpretable. arrow (1 direction). When there might be a reverse
- Matching – for every treated patient you find one non- association (between health and sports), you need to
treated patient with similar observable characteristics distinguish different time points in the graph.
against whom the effect of the treatment can be assessed. - Acyclic – a path of arrows does not come back to its origin
- Weighting – reduce the bias in the survey estimates by (the variable cannot cause itself). The causes always
giving weights to patients in the data to reflect that precede their effects (future cannot cause the past).
patient’s importance relative to the other patients. The - Arrow – each arrow represents a potential causal effect
number of patients in the sample that have certain and the direction of it between two variables. Not drawing
characteristics are decreased so that the sample is more an arrow means that you are certain that there is no
representative of the target population. meaningful association. Always explain why you drew an
- Regression analysis – adjust for several covariates arrow.
(confounders) at the same time; you need to think about - Path – a route (connection) of arrows between exposure
what to adjust for. It is relatively easy to use when having and outcome variable. It does not have to follow the
access to the software, but hard to use as well as when direction of the arrows. (all paths that lead from x to Y
making mistakes the results can be biased. More is not when the research question is what is the effect of x on y).
always better in this case. There are different ways to - Causal path (directed path) – a route between exposure
select the variables you adjust for. and outcome that follows the direction of the arrows,
o Correlation matrix – look at correlations with all the representing the causal relationship. Causal paths can start
covariates available in the dataset. When variables at all the different exposure variables.
are significantly associated with the outcome they - Backdoor path – a route between exposure and outcome
are included in the regression model. This is a bad that does not follow the direction of the arrows (non-
method. causal path).
3
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller ESGroeneveldt. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $11.39. You're not tied to anything after your purchase.