Summary
Strategy Analytics
Data science for Business
Foster Provost & Tom Fawcett
,Chapter 1 Data-Analytic Thinking
Data mining – the extraction of knowledge from data, via technologies that incorporate these
principles. Data mining techniques provide some of the clearest illustrations of the principles
of data science
Data science – a set of fundamental principles that guide the extraction of knowledge from
data
The figure places data in the
context of various other closely
related and data related process
in the organization. It
distinguishes data science from
other aspects of data processing
that are gaining increasing
attention in business.
Data-Driven Decision Making (DDD) – refers to the practice of basing decisions on the analysis
of data, rather than purely on institution. DDD is not an all-or-nothing practice, different firms
engage in DDD to greater or lesser degrees
2 sorts of decisions
1. decisions for which discoveries need to be made within data
2. decisions that repeat, especially at massive scale
The figure shows data science supporting DDD but also overlapping with DDD. This highlight
the often overlooked fact that, increasingly, business decisions are being made automatically
by computer systems.
There is a difference between data science and data-driven businesses.
Data science – needs access to data and it often benefit from sophisticated data engineering
that data processing technologies may facilitate, but these technologies are not data science
technologies per se. They support data science but are (figure) but they are useful for much
more.
2
,Data processing technologies – very important for many data-oriented business tasks that do
not involve extracting knowledge or DDD.
Big data – means datasets that are too large for traditional data processing systems, and
therefore require new processing technologies. Big data technologies are usually used for
implementing data mining techniques and for data processing in support of the data mining
techniques.
Big data 1.0 – firms are busying themselves with building the capabilities to process large
data, largely in support of their current operations, for example improve efficiency
Big data 2.0 – firms started to look further. They began to ask how it could improve things
they’d always done. So, we entered Big data 2.0 where new systems and companies began
taking advantage of their interactive nature of the Web. The most obvious are the
incorporation of social networking components and the rise of the ‘voice’ of the individual
consumer
Data and data science capability as a strategic asset
Data, and the capability to extract useful knowledge from data, should be regarded as key
strategic assets.
Often, we don’t have exactly the right data to best make decisions and/or the right talent to
best support making decisions from the data. Thinking of these as assets should lead us to the
realization that they are complementary. It is often necessary to make investments.
Fundamental concepts of Data science
- Extracting useful knowledge from data to solve business problems can be treated
systematically by following a process with reasonably well-defined stages
o The Cross Industry Standard Process for Data Mining is one codification of this
process. Keeping this in mind provides a framework to structure our thinking
about data analytics problems.
- From a large mass of data, information technology can be used to find informative
descriptive attributes of entities of interest
o For example, a customer would be an entity of interest, and each customer
might be described by a large number of attributes such as, service history. But
how much information is needed? You need to find variables that correlate. A
business analyst might be able to hypothesize and test and there are tools to
facilitate the experimentation
- If you look too hard at a set of data, you will find something – but it might not
generalize beyond the data you’re looking at
o This is referred as overfitting a dataset. Data mining techniques are very
powerful, and the need to detect and avoid overfitting is one of the most
important concepts to grasp when applying data mining to real problems
- Formulating data mining solutions and evaluating the results involves thinking
carefully about the context in which they will be used
o If your goal is the extraction of potentially useful knowledge, how can we
formulate what is useful? It depends on the application is question.
3
,Chapter 2 Business Problems and Data
Science Solutions
Data scientist decompose a business problem into several subtasks. The solutions to the
subtasks can then be composed to solve the overall problem. Some are unique to the
business problem, but others are more common data mining tasks.
Despite the large number of specific data mining algorithms developed over the years, there
are only a handful of fundamentally different types of tasks these algorithms address.
1. Classification and class probability estimation attempts to predicts, for each individual in a
population, which of a (small) set of classes this individual belongs to. Usually the classes re
mutually exclusive. For example, ‘Among all the customers of MegaTelCo, which are likely to
respond to a given offer?’ Two classes could be called will respond and will not respond.
A data mining procedure produces a model that, given a new individual, determines which
class that individual belongs to à class probability estimation. A scoring model applied to an
individual produce, instead of class prediction, a score representing the probability that that
individual belongs to each class. (produce a score of how likely each Is to respond to the
offer).
2. Regression (value estimation) attempts to estimate or predict, for each individual the
numerical value of some variable for that individual. An example, ‘How much will a given
customer use the service?’. The variable to be predicted here is service usage and a model
could be generated by looking at other, similar individuals in the population and their
historical usage.
Regression is related to classification, but the two are different. Classification predicts
whether something will happen, whereas regression predicts how much something will
happen.
3. Similarity matching attempts to identify similar individuals based on data known about
them. Can be used directly to find similar entities. For example, IBM is interested in finding
companies similar to their best business customers, in order to focus their sales on the best
opportunities. They use similarity matching based on ‘firmographic’ data describing
characteristics of the companies. It is the best based for one of the most popular methods for
making products recommendations.
4. Clustering attempts to group individuals in a population together by their similarity, but not
driven by any specific purpose. An example, ‘Do our customers form natural groups or
segments?’ Clustering is useful in preliminary domain exploration to see which natural groups
exist because these groups in turn may suggest other data mining tasks or approaches.
5. Co-occurrence grouping (market-basket analysis) attempts to find associations between
entities based on transactions involving them. For example, ‘What items are commonly
purchased together?’ Co-occurrence considers similarity of objects based on their appearing
4
, together in transaction, it could suggest special promotion, product display or combination
offers.
6. Profiling (behavior description) attempts to characterize the typical behavior of an
individual, group, or population. An example, ‘What is the typical cell phone usage of this
customer segment?’ Behavior can be described generally over an entire population, or down
to the level of small groups or even individuals.
Profiling is often used to establish behavioral norms for anomaly detection applications such
as fraud detection. For example, if we know what kind of purchased a person typically makes
on a credit card, we can determine whether a new charge on the card fits that profile or not.
7. Link prediction attempts to predict connections between data items, usually by suggesting
that a link should exist, and possibly also estimating the strength of the link. For example,
‘Since you and Karen share 10 friends, maybe you’d lie to be Karen’s friend?’
8. Data reduction attempts to take a large set of data and replace it with a smaller set of data
that contains much of the important information in the larger set. The smaller data set may
be easier to deal with or to process.
9. Causal modeling attempts to help us understand what events or actions actually influence
others. For example, we observe that indeed the targeted consumers purchase at a higher
rate subsequent to having been targeted. Was this because the advertisements influenced
the consumers to purchase?
Supervised vs. Unsupervised methods
Unsupervised –
For example, ‘Do our customers naturally fall into different groups?’ Here no specific purpose
or target has been specified for the grouping. When there is no specific target, the data
mining is unsupervised.
Clustering, co-occurrence grouping and profiling are solved with unsupervised data mining.
Supervised –
For example, ‘Can we find groups of customers who have particularly high likelihoods of
canceling their service soon after their contracts expire?’ Here is a specific target defined: will
a customer leave when her contract expires. Segmentation is being done for a specific reason:
to act based on the likelihood of churn.
Important for supervised data mining is that there must be data on the target. Acquiring data
on the target often is a key data science investment.
Classification, regression, causal modelling, similarity marching, link prediction and data
reduction are solved with supervised data mining.
5