Strategy Analytics
Chapter 1. Introduction: Data-Analytic Thinking
Information is now widely available on external events such as market trends, industry news, and
competitors’ movements. This broad availability of data has led to increasing interest in methods for
extracting useful information and knowledge from data—the realm of data science
Probably the widest applications of data-mining techniques are in marketing for tasks such as
targeted marketing, online advertising, and recommendations for cross-selling.
Data mining is used for general customer relationship management to analyze customer behavior in
order to manage attrition and maximize expected customer value.
At a high level, data science is a set of fundamental principles that guide the extraction of knowledge
from data. Data mining is the extraction of knowledge from data, via technologies that incorporate
these principles.
Example Hurricane
It would be more valuable to discover patterns due to the
hurricane that were not obvious. To do this, analysts might
examine the huge volume of Wal-Mart data from prior, similar
situations (such as Hurricane Charley) to identify unusual local
demand for products. From such patterns, the company might be
able to anticipate unusual demand for products and rush stock to
the stores ahead of the hurricane’s landfall.
They show that statistically, the more data- driven a firm is, the more productive it is—even
controlling for a wide range of possible confounding factors. And the differences are not small. One
standard deviation higher on the DDD scale is associated with a 4%–6% increase in productivity. DDD
also is correlated with higher return on assets, return on equity, asset utilization, and market value,
and the relationship seems to be causal.
The sort of decisions we will be interested in in this book mainly fall into two types: (1) decisions for
which “discoveries” need to be made within data, and (2) decisions that repeat, especially at massive
scale, and so decision-making can benefit from even small increases in decision-making accuracy
based on data analysis
Predictive model abstracts away most of the complexity of the world, focusing in on a particular set
of indicators that correlate in some way with a quantity of interest (who will churn, or who will
purchase, who is pregnant, etc.).
Big data essentially means datasets that are too large for traditional data processing systems, and
therefore require new processing technol‐ ogies.
Occasionally, big data technologies are actually used for implementing data mining techniques.
However, much more often the well-known big data technologies are used for data processing in
support of the data mining techniques and other data science activities,
,Big Data 1.0: Firms are busying themselves with building the capabilities to process large data, largely
in support of their current operations—for example, to improve efficiency.
In Web 1.0, businesses busied themselves with getting the basic internet technologies in place, so
that they could establish a web presence, build electronic commerce capability, and improve the
efficiency of their op‐ erations.
Web 2.0, where new systems and companies began taking advantage of the interactive nature of the
Web.
Big Data 2.0: Once firms have become capable of processing massive data in a flexible fashion, they
should begin asking: “What can I now do that I couldn’t do before, or do better than I could do
before?” This is likely to be the golden era of data science.
The prior sections suggest one of the fundamental principles of data science: data, and the capability
to extract useful knowledge from data, should be regarded as key strategic assets.
thinking of these as assets should lead us to the realization that they are complementary.
Sociodemographic data provide a substantial ability to model the sort of consumers that are more
likely to purchase one product or another. (The case in Capital one).
Fundamental concept: Extracting useful knowledge from data to solve business problems can be
treated systematically by following a process with reasonably well-defined stages. The Cross Industry
Standard Process for Data Mining, abbreviated CRISP-DM (CRISP- DM Project, 2000), is one
codification of this process. Keeping such a process in mind provides a framework to structure our
thinking about data analytics problems
Fundamental concept: From a large mass of data, information technology can be used to find
informative descriptive attributes of entities of interest
Fundamental concept: If you look too hard at a set of data, you will find something—but it might not
generalize beyond the data you’re looking at. This is referred to as overfit‐ ting a dataset.
Fundamental concept: Formulating data mining solutions and evaluating the results involves thinking
carefully about the context in which they will be used.
Chapter 2. Business Problems and Data Science Solutions
Data mining is a process with fairly well- understood stages.
A critical skill in data science is the ability to decompose a data- analytics problem into pieces such
that each piece matches a known task for which tools are available.
Tasks:
1. Classification and class probability estimation attempt to predict, for each individual in a
population, which of a (small) set of classes this individual belongs to. Usually, the classes are
mutually exclusive. An example classification question would be: “Among all the customers
of MegaTelCo, which are likely to respond to a given offer?” In this example the two classes
could be called will respond and will not respond. (Whether something will happen).
, 2. Regression (“value estimation”) attempts to estimate or predict, for each individual, the
numerical value of some variable for that individual. An example regression question would
be: “How much will a given customer use the service?” The property (variable) to be
predicted here is service usage. (How much something will happen).
3. Similarity matching attempts to identify similar individuals based on data known about them.
Similarity matching can be used directly to find similar entities.
4. Clustering attempts to group individuals in a population together by their similarity, but not
driven by any specific purpose. An example clustering question would be: “Do our customers
form natural groups or segments?”
5. Co-occurrence grouping (also known as frequent itemset mining, association rule discovery,
and market-basket analysis) attempts to find associations between enti‐ ties based on
transactions involving them.
6. Profiling (also known as behavior description) attempts to characterize the typical behavior
of an individual, group, or population. An example profiling question would be: “What is the
typical cell phone usage of this customer segment?”
7. Link prediction attempts to predict connections between data items, usually by suggesting
that a link should exist, and possibly also estimating the strength of the link. Link prediction is
common in social networking systems: “Since you and Ka ‐ ren share 10 friends, maybe you’d
like to be Karen’s friend?”
8. Data reduction attempts to take a large set of data and replace it with a smaller set of data
that contains much of the important information in the larger set. The smaller dataset may
be easier to deal with or to process.
9. Causal modeling attempts to help us understand what events or actions actually influence
others. For example, consider that we use predictive modeling to target advertisements to
consumers, and we observe that indeed the targeted consumers purchase at a higher rate
subsequent to having been targeted. Was this because the advertisements influenced the
consumers to purchase?
Supervised vs. unsupervised methods
When there is no target, the data mining problem is referred to as unsupervised.
The learner would be given no information about the purpose of the learning but would be left to
form its own conclusions about what the examples have in common.
A supervised technique is given a specific purpose for the grouping—predicting the target. Clustering,
an unsupervised task, produces groupings based on similarities, but there is no guarantee that these
similarities are meaningful or will be useful for any particular purpose.
The value for the target variable for an individual is often called the indi ‐ vidual’s label, emphasizing
that often (not always) one must incur expense to actively label the data.
Classification, regression, and causal modeling generally are solved with supervised methods.
Similarity matching, link prediction, and data reduction could be either. Clustering, cooccurrence
grouping, and profiling generally are unsupervised.
Important distinction pertaining to mining data: the difference between (1) mining the data to find
patterns and build models, and (2) using the results of data mining.