Summary papers ER
Paper 1: The discipline of business experimentation (Thomke, Manzi)
In brief:
The problem: in the absence of sufficient data to inform decisions about proposed
innovations, managers often rely on their experience, intuition, or conventional wisdom –
none of which is necessarily relevant.
The solution: A rigorous scientific test, in which companies separate an independent variable
(the presumed cause) from a dependent variable (the observed effect) while holding all other
potential causes constant, and then manipulate the IV to study changes in the DV -> cause
and effect.
The guidance: to make the most of their experiments, companies must ask: does the
experiment have a clear purpose? Have stakeholders made a commitment to abide by the
results? Is the experiment doable? How can we ensure reliable results? Have we gotten the
most value out of the experiment?
Purpose: Companies should conduct experiments if they are the only practical way to answer specific
questions about proposed management actions. To decide this, managers must figure out what they
want to learn. A good hypothesis with a specific IV and DV is important to support/reject. In many
situations: go beyond the direct effects of an initiative and investigate its ancillary effects. Example
Kohl’s: open stores an hour later to decrease operating costs. Only way to test it was to conduct
rigorous experiment. Result: delayed opening would not result in any meaningful sales decline.
Stakeholders: Before conducting any test, stakeholders must agree how they'll proceed once the
results are in (promise to weight all the findings/willing to walk away from project if it’s not
supported by data). Example Kohl’s: many executives were enthusiastic about adding a new product
category. But test showed a drop in sales and the program was scrapped. This shows that
experiments are needed to perform objective assessments of initiatives.
A process should be instituted to ensure that test results aren’t ignored. Example Publix Super
Markets: all large retail projects must undergo formal experiments to receive a green light (a filtering
process). When constructing/implementing a process, it is important that the experiments should be
part of a learning agenda that supports a firm’s organizational priorities. Example Petco: each test
request must address how the experiment would contribute to the overall strategy (become more
innovative). Number of tests reduced.
,Feasibility: experiments must have testable predictions. But, the causal density of the business
environment (the complexity of variables/interactions) can make it difficult to determine cause-and-
effect relationships. Environments are constantly changing and potential causes of business
outcomes are often uncertain/unknown. Example retail chain: will test if changing the name of
QwikMart stores to FastMart lead to an increase in revenue. Obvious solution: conduct an experiment
by changing the name of a few QwikMart stores. But even determining the effect of the name change
on those stores turns out to be tricky, because many other variables may have changed at the same
time. Unless the company can isolate the effect of the name change from other variables, they won’t
know for sure whether the name change has helped (or hurt) business.
To deal with environments of high causal density: use a sample large enough to average out the
effects of all variables (expect those being studied). But this is not always doable. Then, one can use
sophisticated analytical techniques (e.g. big data) to increase statistical validity.
Note: managers often mistakenly assume that a larger sample will automatically lead to better data
(if observations are highly clustered: true sample size might actually be quite small). The required
sample size depends on the magnitude of the expected effect (large effect = smaller sample, small
effect = larger sample).
Reliability: companies typically have to make trade-offs between reliability, cost, time, etc. Methods
that help reduce the trade-off (thus increasing the reliability of the results):
1. Randomized field trials: take large group of individuals (same characteristics) and randomly
divide them into two subgroups. Administer the treatment to just one subgroup, and
compare results. Randomization prevent systemic bias (consciously or unconsciously) from
effecting an experiment, and it evenly spreads any remaining (and possibly unknown)
potential causes of the outcome between the test and control groups. Wrong example
Petco: select its 30 best stores to try out a new initiative and compare them with its 30 worst
stores (control group) = bias.
2. Blind tests: (conducted by Petco/Publix), which help prevent the Hawthorne effect: the
tendency of participants to modify their behavior (consciously or subconsciously) when they
are aware that they are part of an experiment.
3. Big data: to filter out statistical noise and identify cause-and-effect relationships, business
experiments should ideally employ samples numbering in the thousands. But this can be
prohibitively expensive or impossible. In some (offline/indirect-channel) environments (e.g.
retail stores) sample sizes are often smaller than 100. To minimize the effect of this
limitation, companies can utilize specialized algorithms in combination with big data.
Example large retailer: tests 20 redesigned stores. Company uses big data (including
transaction-level data/store attributes and data on environments around stores) to select
stores for the control group that were a close match with those in which the redesign was
tested, which made the small sample size statistically valid.
Even when a company can’t follow a rigorous testing protocol, analysts can help
identify/correct for certain biases, randomization failures, and other experimental
imperfections. (Common situation is nonrandomized natural experiments). For any
experiment, the gold standard is repeatability: others conducting the same test should
obtain similar results. Repeating an expensive test is usually impractical, but companies can
verify results in other ways.
How big data can help:
o Getting started: Use big data to collect detailed data about each unit of analysis (e.g.
each store).
o Building a control group: Use big data to correctly matching test subjects to control
subjects based on characteristics (if sample is small).
, o Targeting the best opportunities: By pinpointing patterns in the data, the
experimenter can implement the program in situation where it works.
o Tailoring the program: Additional large data feeds can be used to characterize
program components that are more or less effective.
Value: many companies fail to make the most of experiments. Take into account a proposed
initiative’s effect on various customers/markets/segments and concentrate investments in areas
where the potential paybacks are highest -> what works where? Example Petco: rolls out a program
only in stores that are most similar to the test stores that had best results. Another useful tactic is
‘value engineering’: implement just the components with an attractive ROI -> conducting
experiments to investigate various combinations of components. Also, experiment analyses can
enable companies to better understand their operations and test their assumptions of which
variables cause which effects (look beyond correlation and investigate causality). Without fully
understand causality, companies leave themselves open to making big mistakes. Example Cracker
Barrel Old Country Store: conducted an experiment to test to switch from incandescent to LED light.
Result: traffic decreased in locations with LED lights. The company dug deeper to understand the
underlying causes. Reason: store managers hadn’t previously been following the company’s lighting
standards, so the luminosity dropped when the stores adhered to the new LE policy, and people
thought the restaurants were closed. Correlation alone would have left the company with the wrong
impression (LEDs are bad for businesses). Conducting an experiment is just the beginning. Value
comes from analyzing and then exploiting the data.
Challenging conventional wisdom: By paying attention to sample sizes, control groups,
randomization, and other factors, companies can ensure the validity of their test results. The more
valid/repeatable the results, the better they will hold up in the face of internal resistance (= strong
when results challenge long-standing industry practices and conventional wisdom). Example Petco:
investigation showed a price that was ended in $.25. That result went against the grain of
conventional wisdom (price ending in 9). But after a try, there was a sales jump of 25% after six
months.