Summary literature of Advanced Research Methods
2020-2021
Week 1................................................................................................................................................3
Wheelan C.J. (2014). Naked statistics: stripping the dread from the data......................................3
Hernán M.A., Robin J.M. (2020). Causal inference: what if............................................................6
Hernan, M.A. (2002). Causal knowledge as prerequisite for confounding evaluation: an
application to Birth Defects Epidemiology......................................................................................8
Hernán, Migual A. (2016). Does water kill? A call for less causal inferences................................10
Suttorp, Marit M., Siegerink Bob, Jager, Kitty J., Zocalli, Carmine, Dekker, Friedo W. (2014).
Graphical presentation of confounding in directed acyclic graphs...............................................12
Week 2..............................................................................................................................................13
Wheelan C.J. (2014). Naked statistics: stripping the dread from the data....................................13
Nuzzo, Regina (2014). Scientific method: statistical errors...........................................................19
Greenland, Sander, Senn, Stephen J., Rothman, Kenneth J., Carlin, John B., Poole, Charles,
Goodman, Steven N., Altman, Douglas G. (2016). Statistical tests, P-values, confidence intervals,
and power: a guide to misinterpretations....................................................................................20
Cole, Stephen R., Hernán, Miguel A. (2002). Fallibility in estimating direct affects......................21
VanderWeele, Tyler J.; Robinson, Whitney R. (2014). On the causal Interpretation of Race in
Regressions Adjusting for Confounding and Mediating Variables................................................22
Wasserstein, Ronald L. Schirm, Allen L., Lazar, Nicole A. (2019). Moving to a World beyond......22
Week 3..............................................................................................................................................24
Wheelan C.J. (2014). Naked statistics: stripping the dread from the data....................................24
Viera, Anthony J. (2008). Odds ratios and Risk ratios: what’s the difference and why does it
matter?.........................................................................................................................................25
Westreich, D., Greenland, S. (2013). The table 2 Fallacy: presenting and interpreting confounder
and modifier coefficients..............................................................................................................27
Week 4..............................................................................................................................................28
Green J. Thorogood N. (2009). Qualitative methods for Health Research....................................28
Alvesson, Mats, Karreman, Dan. (2000). Varities of Discourse: On the Study of Organizations
through discourse analysis...........................................................................................................30
Hodges, B. D., Kuper, A., Reeves, S. (2008). Discourse analysis....................................................33
Sandberg, Jörgen, Alvession, Mats. (2010). Ways of constructing research questions: gapspotting
or problematization......................................................................................................................35
Teghtsoonian, Katherine (2009). Depression and mental health in neoliberal times: A critical
analysis of policy and discourse....................................................................................................37
Week 5..............................................................................................................................................39
Reeves, S. et al. (2008) Why use theories in qualitative research?...............................................39
Reeves, S. et al. (2008). Qualitative research methologies: ethnography.....................................40
1
, Wilson, William Julius, et al. (2009). The role of theory in ethnographic research.......................41
Waring, Justing J., et al. (2010). ‘’Water cooler’’ learning: knowledge sharing at the clinical
‘’backstage’’ and it’s contribution to patient safety.....................................................................43
Gengler, Amanda M. (2014). ‘’I want you to save my kid!’’: Illness Management Strategies,
Access, and Inequality at an Elite University Research Hospital...................................................44
Hallett, Ronald E., et all. (2013). Ethnographic Research in a Cyber Era.......................................45
Week 6..............................................................................................................................................46
Hernán, Migual A., et al. (2019). A second change to get causal inference right: a classification of
Data Science Tasks........................................................................................................................46
Hernán, Miguel A. (2018). The C-word: scientific euphemisms do not improve causal inference
from observational data...............................................................................................................48
Huitfeldt, Anders. (2016). Is caviar a risk factor for being a millionaire?......................................48
Ways, N. (2000). Qualitative research in health care: Assessing quality in qualitative research. .49
2
,Week 1
Wheelan C.J. (2014). Naked statistics: stripping the dread from the data
Chapter 6: Problems with probability
The chapter is about the VaR models which calculates the risk of losing your money at Wall market.
There 2 huge problems with the risk profiles encapsulated by this model:
1. The underlying probabilities on which the models were built were based on past market
movements; however in financial markets, the future does not necessarily look like the past
2. Even if the underlying data could accurately predict future risk, the 99 percent assurance
offered by the VaR model was dangerously useless, because it’s the 1 percent that is going to
really mess you up.
‘The greatest risks are never the ones you can see and measure, but the ones you can’t see and
therefore can never measure’
This wall street accident made 3 fundamental errors:
1. They confused precision with accuracy, it led to believe they had risk on a leash when in fact
they did not
2. The estimates of the underlying probabilities were wrong (you can’t use old date the predict
the future)
3. Firms neglected their ‘tail risk’.
So: probability doesn’t make mistakes, people using probability make mistakes
Most common probability-related errors, misunderstandings, and ethical dilemma:
1. To assume that some events are independent when in fact they are dependent (think about
the engines, or the cot deatch)
2. Not understanding when evens ARE independent. The very definition of statistical
independence between two events is that the outcome of one has no effect on the outcome
of the other, so: how can flipping a series of tails in a row make it more likely that the coin
will turn up heads on the next flip?
3. Clusters happen. We often think that if a group gets a disease, it must be something in the
water or anything else. Sure it’s possible, but it can also be a product of change. This is
something we often forget (think about the class, standing and flipping coins. The kid who
flipped tail 5 times in a row doesn’t have a special talent, it was just by chance) ‘when we
see an anomalous event like that out of context, however, we assume that something besides
randomness must be responsible’
4. The prosecutor’s fallacy. So imagine the DNA of the defendant matched the DNA on the
crime scene, there is only 1 in a million change that it matches someone else’s DNA. In case
of 2 it matched, but it can just be coincidence that it matches with the DNA of the killer:
‘because the changes of finding a coincidental one in a million match are relatively high if you
run the sample through a database with sample from a million people.
5. Reversion to the mean (or regression to the mean). When someone does better on a test,
there might be chance that this person does worse on a resit. An explanation is because you
have bad luck and good luck. So at the resit the score goes back to the mean of that person,
the score you would aspect if the person didn’t have any luck (bad or good). Another
example is the example of the flip coin king.
6. Statistical discrimination. Is it okay to discriminate if the data tell us that we’ll be right far
more often than we’re wrong? If we can build a model that identifies drug smugglers
correctly 80 out of 100 times, what happens to the poor souls in the 20 percent—because our
model is going to harass them over and over and over again?
! Think carefully about you calculations and why you are doing them
3
, Chapter 7: The importance of data
Think about the example of the fruit flies and the alcohol ‘no amount of fancy analysis can make
up for fundamentally flawed data, “garbage in, garbage out.” Data deserve respect, just like
offensive linemen’
We generally ask our data to do one of the three things:
1. We may demand a data sample that is representative of some larger group or population (for
example a sample of a group that is likely to vote and not of everyone living in that
jurisdiction)
The key idea is that a properly drawn sample will look like the population from which it is
drawn; If you’ve stirred your soup adequately, a single spoonful can tell you how the
whole pot tastes.
Several important things with a representative sample:
- A representative sample is a fabulously important thing, for it opens the door to some of the
most powerful tools that statistics has to offer
- Getting a good sample is harder than it looks.
- Many of the most egregious statistical assertions are caused by good statistical methods
applied to bad samples, not the opposite.
- Size matters, and bigger is better however a bigger sample will not make up for errors in
its composition, or ‘bias’. A bad sample stays a bad sample.
2. We ask that they provide some source of comparison. You have a control group and an
exposed group. With animals it’s easy to make them do thing, but with humans it’s more
complicated. One recurring research challenge with human subjects is creating treatment
and control groups that differ only in that one group is getting the treatment and the other is
not. For this reason, the “gold standard” of research is randomization, a process by which
human subjects are randomly assigned to either the treatment or the control group
3. ‘Just because’ . We sometimes have no specific idea what we will do with the information—
but we suspect it will come in handy at some point. Think about the Framingham data with
the longitudinal data. The research equivalent of a Toyota is a cross-sectional data set,
which is a collection of data gathered at a single point in time
Some examples of garbage in, garbage out (the statistical analysis is fine, but the data on which the
calculations are performed are bogus or inappropriate):
Selecting bias; does everyone has the same chance to end up in the study?
- A survey of consumers in an airport is going to be biased by the fact that people who
fly are likely to be wealthier than the general public; a survey at a rest stop on
Interstate 90 may have the opposite problem.
- Also think about the example with prostate cancer and sexual activity
Self-selecting bias; when individual volunteer to be in a treatment group.
Publication bias; think about the example with the gaming and colon cancer The source of
the bias stems not from the studies themselves but from the skewed information that
actually reaches the public.
Recall bias; not all respondents know exactly what they did or eat in the past, Memory is a
fascinating thing—though not always a great source of good data.
Survivorship bias; which occurs when some or many of the observations are falling out of the
sample, changing the composition of the observations that are left and therefore affecting
the results of any analysis (think about the example with the funds and S&P 500)
Healthy user bias; people who faithfully engage in activities that are good for them, like a
healthy diet, are different from those who don’t. This effect can confound any study trying to
evaluate the real effect of activities perceived to be healthful.
4