Data-Driven Decision-Making - Summary of the Reader
Exam questions
- Identifying the problem
- Evaluate the identification of the problem
- How are uncertainty and data related (theory)
- Interpret statistical output
- Relative relevance of parameters in regression outputs, why is one
factor more important than the other one?
- Mental model evaluate
- Data analysis
- Draw conclusions and provide solutions for identified problem
1 – Introduction
Most firms use data in their decision-making processes. However, in its
raw form, data is useless. Raw data needs structuring and aggregation in
the right way to yield insights. This is where data analytics comes in. Data
analytics takes raw data and structures and aggregates it to find
meaningful data patterns. Those can be used to make better business
decisions.
Data analytics is just one part of two. It is important to find regular
patterns in the data (data analytics), but it is a different skill that helps you
determine whether these patterns contain new insights or how to make
sensible decision based on patterns. Expert knowledge is needed in the
domain of the decision problem. This course is about the unique
combination of analysis techniques and accounting expert knowledge in
order to come up with good decisions.
In most cases, it is not immediately obvious how you would use data to
answer questions. For this reason, it is important to start any analysis by
carefully thinking about the question you need an answer to. A framework
for decision making is important here. Contrarily, starting from the data is
not advisable. You should look at many different sources of data to find
answer to a question. If you start with the data, you might get misleading
conclusions because you analyzed inadequate, but readily available data.
For those and many other reasons, the decision-making framework starts
with thinking carefully about the business question to be analyzed.
2 – Decision-Making Basics
Most important business decisions involve some degree of uncertainty.
Think of uncertainty as missing knowledge about past, present or future
events. The knowledge that you would need in order to choose the best
action under various potential actions. Whenever there is uncertainty,
there is a chance to make a mistake or a sub-optimal choice. In real-world
cases, mistakes can be costly. A useful decision rule suggests not only an
expected best action out of a set of actions. It also accounts for the costs
of different kinds of potential mistakes. You will not always be able to
quantify costs or probabilities of making mistakes. That is fine, we still
need to come up with simple decision rules to help make sensible decision.
,‘’If this course of action still performs best in the worst case, then let’s do
it’’. This is called the maximin rule. We want to reduce the change
of making (costly) mistakes, and thus want to reduce uncertainty
in decision-making. Data is the key input to reduce uncertainty but
is not enough on its own. Assumptions are needed.
The decision-making process framework is needed because data
analytics is not easy. It is easy to get stuck and stare at the data,
not seeing the forest for the trees.
1. Detect the problem. Sometimes, the problem is clear.
However, sometimes we first need to realize that there is a
problem in the first place. This does not always happen
automatically. Well-designed monitoring systems that collect data
and measure developments across multiple dimensions often throw
the first warning signs.
2. Identify the problem. This is the most important step. We clarify that
we want to know, define what the current situation looks like. Then
define the ideal outcome and describe the difference and figure out
what is causing the difference. The root-cause analysis will give you
a path towards the problem. Important to slow down and properly
identify the problem, do not go in blind and start digging. This also
holds for exploratory analysis, first step back. This is not always
easy. Guide your analysis by specifying the goal clearly and asking a
succession of questions that are aimed at bringing you closer to the
goal. Often, we need to gather and analyze a significant amount of
data to better understand what is happening. A lot of our data
analysis will happen at this stage. One can structure a decision
problem into a series of questions to be answered: Who is doing
what to whom? Where/When does the problem appear to arise?
What is the process behind what we observe? What is the reason for
the suspected reason? What do need to solve this problem?
3. Establish decision criteria. Once we have a better grip on what
exactly the issue is, we need to get criteria that define what a good
solution should look like. It is hard to give guidance here as these
criteria are usually problem specific. Typical decision criteria are
financial benefits, resource usage, quality, riskiness, acceptability to
others, etc. It is vital to establish criteria at this stage of the process.
Without criteria to describe the goal we are working towards, there is
a big risk of wasting serious time and effort into designing solutions
that will be thrown away immediately. When there are multiple
criteria, it is important to weight them. Multi-criteria decision-making
is not easy, because the notion of a ‘’best solution’’ becomes non-
trivial. The weights are not always obvious. Typical questions you
can ask in this stage: What is affected by the possible alternatives?
What do we need to trade off (e.g., costs vs. benefits)? What cost is
more significant for us?
4. Develop alternatives. After setting the right decision criteria it is
time to come up with potential alternatives that might lead to a
solution of the problem. Creativity is often key. Insights gathered
from a root-cause analysis in step 2 can also be useful. Developing
, alternative solutions involves building different mental models of
how the problem might have arisen. The large chunk of data
analytics resides. It is required for the following reasons:
- We need to collect and analyze raw data to develop mental
models and propose alternative solutions based on them
- We need to collect and analyze raw data test and evaluate the
assumptions underlying the alternative solutions. Are the mental
models on which the solutions are resting supported by the data?
This is called diagnostic analysis
- Properly fleshed-out solutions often require data-based inputs
Important takeaway: we often want to know conditional means in
forecasting problems. You can do this to some rudimentary degree
just using tables. If cells only have a N = 1, we have an ‘anecdote’.
5. Evaluate alternatives. Even if you end up with one alternative, you
still need to evaluate it. Three important questions to answer here:
- How do the alternatives rank with regard to our decision criteria?
Relevant whenever we have multiple alternatives and multiple
decision criteria left
- What is the uncertainty about the outcome of a decision given a
certain alternative is chosen? Uncertainty with respect to the
outcome associated with a chosen alternative. Tricky to answer
and hence ignored.
- What are the potential costs of mistakes when choosing a certain
alternative? For example, is understating revenues as costly as
being overoptimistic and overstating potential revenues?
Executives of publicly listed companies rather want to be seen as
too pessimistic rather than too optimistic. Similarly, the ‘’cost’’ of
a patient adversely reacting to a drug is surely much higher than
if a drug fails to help the patient. Thus, while it might be hard to
quantify the exact costs, nevertheless many business decisions
consider that not all mistakes are to be treated equally. When you
are able to quantify the costs of making a mistake, you can
compose a loss function. Even if not, loss functions are a helpful
mental model to think about the costs of making errors. If you use
any machine learning method, be aware that you also have
chosen a loss function that comes with its predictions
6. Choose an alternative. Once you have decided on a weighting
scheme of different alternatives, considered the costs of mistakes of
pursuing an alternative, and scored each alternative according to the
chosen criteria, the next step is to decide based on a final selection
rule. For important decisions with sizable uncertainty, it is important
to do a scenario analysis. This is a way to quantify uncertainty and
the costs of mistakes. You can check how robust your alternative
performs in the worst scenario. Two common rules that deal
differently with uncertainty as expressed in scenarios:
- Weighted averages. Pick the alternative with the highest
weighted net benefit (score), with weights according to how
probable different scenarios are. In this case, you choose the
, alternative that performs best in the most likely scenarios but
also takes outliers into account
- Maximin/Minimax. Also called the criterion of pessimism. Maximin
is the maximum of a set of minima. According to this rule, we pick
the alternative that is expected to yield the largest of a set of
minimum possible gains (net benefit). ‘’Which alternative does
best in the worst scenarios of each alternative you can think of.’’
Minimax is the same just phrased in terms of loss (very often
used in machine learning because it can be tied directly to loss
functions). You pick the option that minimizes the loss across the
worst-case scenarios for each alternative.
There are other rules, but these two illustrate the general problem
sufficiently. For complex and important problems, a scenario analysis
is a common way to deal with the sizable uncertainty inherent in any
forecast and a sensible decision rule seeks a solution that is robust
to different states of the world.
Types of analyses
Descriptive analysis What happened in the past? Identifying data
patterns. Descriptive analytics examines what happened in the past. The
steps ‘detect the problem’ and ‘identify the problem’ are reliant on a well-
executed descriptive analysis. Descriptive analysis is where we examine
data about past events to spot patterns and trends in the data. A good
descriptive analysis takes the raw data and aggregates and summarizes it
in just the right way to isolate the patterns that reveals the insight we are
looking for. In practice, this is the core of most businesses’ analytics.
Simply because you can already answer many important questions.
Descriptive analysis is a powerful tool and often exploratory, requiring
creativity and business expertise. For more complex decisions, it is a
helpful first step for decision-makers and managers. Once relevant
patterns in the past are spotted, it is up to us to ask how or why those
patterns arise and develop adequate responses to it.
Diagnostic analysis. Explaining patterns. Diagnostic analysis is more
advanced and tries to find answers to the question; ‘’Why did it happen?’’
Lot of academics are trained to answer this question, using for example
the hypothesis testing framework. Causal analysis and experiments are
put into this category too. You can already get quite far and rule out some
explanations by simple data mining and correlation analysis
Predictive analysis What will happen? Predicting the future. This is about
making forecasts, predicting likely outcomes. It is effectively done by
comparisons and extrapolation. Even though the analysis methods
become more complex, any forecasting method still extrapolates from
past data to make predictions. Based on past patterns, we predict what
future data could look like. Statistical modeling or machine learning are
commonly used to conduct predictive analyses. Basic logic behind