Bayesian data analysis for newcomers – John K. Kruschke, and Torrin M. Liddell
Abstract
This article explains the foundational concepts of Bayesian data analysis using virtually no
mathematical notation. Bayesian ideas already match your intuitions from everyday reasoning and
from traditional data analysis. Simple examples of Bayesian data analysis are presented that illustrate
how the information delivered by a Bayesian analysis can be directly interpreted. Bayesian
approaches to null-value assessment are discussed. The article clarifies misconceptions about
Bayesian methods that newcomers might have acquired elsewhere. We discuss prior distributions
and explain how they are not a liability but an important asset. We discuss the relation of Bayesian
data analysis to Bayesian models of mind, and we briefly discuss what methodological problems
Bayesian data analysis is not meant to solve. After you have read this article, you should have a clear
sense of how Bayesian data analysis works and the sort of information it delivers, and why that
information is so intuitive and useful for drawing conclusions from data.
The first section of the article explains the foundational ideas of Bayesian methods, and shows how
those ideas already match your intuitions from everyday reasoning and research. The next sections
show some simple examples of Bayesian data analysis, for you to see how the information delivered
by a Bayesian analysis can be directly interpreted. We discuss Bayesian parameter estimation,
Bayesian model comparison, and Bayesian approaches to assessing null values. The final sections
focus on disabusing possible misconceptions that newcomers might have.
The main idea: Bayesian analysis is reallocation of credibility across possibilities
The main idea of Bayesian analysis is simple and intuitive. There are some data to be explained, and
we have a set of candidate explanations. Before knowing the new data, the candidate explanations
have some prior credibilities of being the best explanation. Then, when given the new data, we shift
credibility toward the candidate explanations that better account
for the data, and we shift credibility away from the candidate
explanations that do not account well for the data. A
mathematically compelling way to reallocate credibility is called
Bayes’ rule.
You already are Bayesian in everyday reasoning
Sherlock Holmes’ reasoning is Bayesian
The fictional detective Sherlock Holmes was famous for remarking
to his sidekick, Dr. Watson: “How often have I said to you that
when you have eliminated the impossible, whatever remains,
however improbable, must be the truth?” That simple reasoning is
Bayesian. From the data, suspicion is reallocated across the
suspects. If the data eliminate some suspects, the remaining
suspects must be more suspicious, even if their prior probability
was small. Bayesian analysis does exactly the same reallocation,
but using precise mathematics. Figure 1 illustrates the reallocation
,graphically. The horizontal axis denotes the range of possibilities, and the vertical axis denotes the
credibility, or probability, of each possibility.
In Holmes’ reasoning, a tacit premise is that the actual culprit is included in the set of suspects. A
more accurate phrasing for Holmesian reasoning is this: When you have eliminated the impossible,
whatever remains, however improbable, must be the least bad option from among the possibilities
you are considering. In general, Bayesian reasoning provides the relative credibilities within the set of
considered possibilities.
Exoneration is Bayesian
The logic of exoneration is Bayesian: Reasoning starts with a set of candidate causes of the event,
then collects new data such as a confession, and then reallocates credibility accordingly. If the data
fully implicate one suspect, the remaining (unaffiliated) suspect must be less suspicious. Bayesian
analysis does the same reallocation, but with exact mathematics.
The possibilities are parameter values
In data analysis, the candidate explanations are values of parameters in mathematical descriptions.
We conceive of each randomly polled response as a flip of a coin that has some underlying
probability of coming up yes. We start with some prior allocation of credibilities across the
continuum of possible parameter values. The prior allocation could be quite vague and spread evenly
across the range of candidate values from 0 to 1, or the prior could give some candidate proportions
higher credibility than others if previous knowledge recommends it. Then we collect data and re-
allocate credibility to parameter values that are consistent with the data.
Parameter values are meaningful in the context of their model
We care about parameter values because they are meaningful. To illustrate, suppose we collect the
weights of all the children in third grade of a particular school. We describe the set of data as being
randomly generated from a normal distribution with mean μ and standard deviation σ. If we tell you
that certain values of μ and σ are good descriptions of the data, then you have a pretty good idea of
what the data look like. The values of the parameters are meaningful in the context of the model.
Bayesian analysis tells us the relative credibilities of the parameter values. That’s why the
information provided by a Bayesian analysis is so useful and intuitive.
Parameters can be discrete or continuous
In many mathematical descriptions of data, the parameters are continuous, such as means, standard
deviations, and regression coefficients. The posterior distribution on continuous parameters is a
continuous distribution, rising and falling smoothly across the range of the joint parameter space. As
nearly all statistical models involve continuous parameters, it is continuous parameter distributions
that dominate Bayesian data analysis.
But descriptive parameters can also be discrete, not continuous. E.g., having a disease or not having a
disease.
Another important case of a discrete parameter, that we will revisit later, is model comparison. For a
given set of data, there can be multiple models. Each model involves its own parameters and prior
distribution over its parameters. The models are labeled by a discrete indexical parameter (“1” for
the first model, “2” for the second model, and so on). When new data are considered, credibility
shifts over the parameter distributions within each model, andcredibility simultaneously shifts over
, the discrete indexical parameter. The re-allocated posterior probabilities of the model indices are the
relative degrees to which we believe each model, given the data. In particular, when one model
represents a null hypothesis and a second model represents an alternative hypothesis, this discrete
model comparison is one way of implementing hypothesis testing in a Bayesian framework.
The more general application is for continuous parameters; discrete parameters are just a special
case.
Bayesian analysis provides the relative credibilities of parameter values
The goal of Bayesian data analysis is to provide an explicit distribution of credibilities across the range
of candidate parameter values. This distribution, derived after new data are taken into account, is
called the posterior distribution across the parameter values. The posterior distribution can be
directly examined to see which parameter values are most credible, and what range of parameter
values covers the most credible values.
The posterior distribution can be directly interpreted. We can “read off” the most credible parameter
values and the range of reasonable parameter values. Unlike in frequentist statistical analysis, there
is no need to generate sampling distributions from null hypotheses and to figure out the probability
that fictitious data would be more extreme than the observed data. In other words, there is no need
for p values and p value based confidence intervals. Instead, measures of uncertainty are based
directly on posterior credible intervals.
You already are Bayesian in data analysis
Bayesian reasoning is so intuitive that it’s hard to resist spontaneously giving Bayesian
interpretations to results of traditional frequentist data analysis. Consider, for example, the t test for
a single group of 50 people to whom we administered a “smart drug” and then an
intelligencequotient (IQ) examination. We would like to know if the mean IQ score of the group
differs from the general population average of 100. Suppose the t test yields t (49) = 2.36, p = 0.023,
with 95% confidence interval on μ extending from 100.74 to 109.26. What does this mean?
Your intuitive interpretation of p values is Bayesian
Consider the result that p = 0.023. This means that the probability that μ equals the “null” value of
100 is only 0.023, right? This is a natural, intuitive interpretation and is the one that many or most
people give. Unfortunately, it is not the correct interpretation of the p value. A Bayesian hypothesis
test provides the probability that μ equals the null value relative to an alternative hypothesis that μ
could span a wide range of values. In other words, we naturally interpret a frequentist p value as if it
were some form of Bayesian posterior probability.
But a frequentist p value is not a Bayesian posterior probability. The p value is the probability that
the observed data summary (such as its t value), or something more extreme than observed would
be obtained if the null hypothesis were true and the data were sampled according to the same
stopping and testing intentions as the observed data. In other words, the p value is the probability
that fictional, counterfactual data from the null hypothesis would be more extreme than the
observed data, when those data were sampled and tested as intended by the current researchers.
Different stopping and testing intentions therefore yield different p values.
Your intuitive interpretation of confidence intervals is Bayesian
Consider the 95% confidence interval from 100.74 to 109.26. That means there is a 95% probability
that the mean μ falls between 100.74 and 109.26, right? This is a natural, intuitive interpretation, and