Summary Data Science for Business
Ch1 – Introduction: Data-analytic thinking
With a lot of data and rising computing power available, companies are focused on exploiting data
for competitive advantage. Data mining is used in marketing, CRM, analysing behaviour or maximize
expected customer value. In finance industries for credit scoring, trading, fraud detection and
workforce management. Major retailers apply it throughout their business (ex. Amazon). Firms can
differentiate through data science.
Data science is a set of fundamental principles that guide the extraction of knowledge from data
(data analytical thinking). Data mining is the extraction of knowledge, via technologies that
incorporate these principles.
➔ Book describes number of fundamental data science principles, and illustrates each with data
mining techniques that embody that principle. Data science and data mining are often used
interchangeable.
Data can be used to predict sale-scenarios for ´expected´ products (ex. water before hurricane). Data
mining can be used to discover patterns in sales that are unexpected! Unusual products (pop tarts,
beer) sold more as hurricane approached, that knowledge is very valuable. Bottom-up approach.
Churn is customer retention; switching form one company to another (ex. telecom). Expensive! Good
to spend money preventing churn than attract new customers.
A goal of data science is to improve decision making.
,Data-driven decision-making (DDD) is the practice of basing decisions on the analysis of data, rather
than intuition. Evidence: the more data-driven a firm is, the more productive it is!
Two types of decisions:
1. Decisions for which ‘discoveries’ need to be made within data (ex. hurricane, discovering
increase in sales of ‘unexpected’ products)
2. Decisions that repeat, especially at massive scale, so decision making can benefit from even
small increases in accuracy based on data analysis (ex. improving your churn-rate!)
A predictive model focuses on a particular set of indicators that correlate in some way with a
quantity of interest (who will churn, who will purchase, who is pregnant etc).
Data science supports DDD, and overlaps with it! Increasing business decisions are made
automatically by computer systems. Automated decision making is seen in banking and consumer
credit-industries, retail stores (automatic orders) and many more! Current focus on online
advertising as more and more people spend time online.
Data processing/engineering support data science, it is not the same! Data science needs access to
data, benefits from good data engineering that data processing technologies facilitate, but they are
not data science technologies per se!
Big data means the datasets are too large for traditional processing systems, and require new
processing technologies. Big data technologies can be used for data engineering, implementing data
mining techniques, but more known for data processing in support of data mining techniques and
other data science activities (see fig 1.1).
Firms utilizing big data are proven to gain advantage over their competitors (ex. Amazon). Retains its
customers easier!
Hence, fundamental principle of data science: data, and the capability to extract useful knowledge
from data, should be regarded as key strategic asset. → Data analytical thinking!
Important to realize; appropriate data and a good data science team cannot create value without
each other! Wise decisions on how to invest in data assets can have big payoff.
For the churn example; take the data on prior churn and extract patterns that are useful, which help
to predict customers that are more likely to leave in the future, or help design better services.
Fundamental principle: extracting useful knowledge from data to solve business problems can be
treated systematically by following a process with reasonably well-defined stages; CRISP-DM!
CRISP-DM is the Cross Industry Standard Process for Data Mining. It provides a framework to
structure thinking about data analytical problems.
Fundamental principle: From a large mass of data, information technology can be used to find
informative descriptive attributes of entities of interest. (ex. Churning, customer = entity of interest).
Fundamental principle: If you look too hard at a set of data, you will find something – but it might
not generalize beyond the data you’re looking at.
➔ This is referred to as overfitting a dataset. Overfitting is one of the most important concepts
to avoid when applying data mining to real problems.
,Fundamental principle: Formulating data mining solutions and evaluating the results involves
thinking carefully about the context in which they will be used.
➔ Information is only useful when it fits the question.
Book summary
This book is about the extraction of useful information and knowledge from large volumes of data, in
order to improve business decision-making. As the massive collection of data has spread through just
about every industry sector and business unit, so have the opportunities for mining the data.
Underlying the extensive body of techniques for mining data is a much smaller set of fundamental
concepts comprising data science. These concepts are general and encapsulate much of the essence
of data mining and business analytics.
Success in today’s data-oriented business environment requires being able to think about how these
fundamental concepts apply to particular business problems – to think data-analytically. For
example, in this chapter we discussed the principle that data should be thought of as a business
asset, and once we are thinking in this direction we start to ask whether (and how much) we should
invest in data. Thus, an understanding of these fundamental concepts is important not only for data
scientists themselves, but for anyone working with data scientists, employing data scientists,
investing in data-heavy ventures or directing the application of analytics in an organization.
Thinking data-analytically is aided by conceptual frameworks discussed throughout the book. For
example, the automated extraction of patterns from data is a process with well-defined stages,
which are the subject of the next chapter. Understanding the process and the stages helps to
structure our data-analytic thinking, and to make it more systematic and therefore less prone to
errors and omissions.
There is convincing evidence that data-driven decision making and big data technologies substantially
improve business performance. Data science supports data-driven decision making – and sometimes
conducts such decision making automatically – and depends upon technologies for ‘big data’ storage
and engineering, but its principles are separate. The data science principles we discuss in this book
also differ from, and are complementary to, other important technologies, such as statistical
hypothesis testing and database querying (which have their own books and classes). The next
chapter describes some of these differences in more detail.
, Ch. 2 – Business problems and data science solutions
Fundamental principle: A set of canonical data mining tasks; the data mining process; supervised
versus unsupervised data mining.
Data mining is a process with fairly well-understood stages. Data scientists decompose a business
problem into subtasks. Solutions to the subtasks compose to solve the overall problem. Some subs
will be company-specific, some industry general; familiar problems and their solutions avoid wasting
time and resources reinventing the wheel.
Different data mining algorithms:
1. Classification and class probability estimation attempt to predict, for each individual in a
population, which set of classes this individual belongs to (ex. will respond vs will not
respond).
Classification; given a new individual, which class does the individual belong to? (Will it
happen?)
Probability estimation; a score representing the probability the individual belongs to each
class. (What is the probability it will happen?)
2. Regression attempts to estimate/predict, for each individual, a numerical value of some
variable for that individual (ex. how much will a given customer use the service?). A
predictive model could be generated by looking at similar individuals and their historical
usage.
Given a new individual, estimates the value of a variable specific to that individual. (How
much will it happen?)
3. Similarity matching attempts to identify similar individuals based on the known data. Is the
basis for making product recommendations (finding people similar to you in terms of the
products they liked or purchased).
4. Clustering attempts to group individuals in a population together by their similarity, but not
driven by a specific purpose! (Do our customers form natural groups or segments?)
5. Co-occurrence grouping attempts to find associations between entities based on
transactions involving them. (What items are commonly purchased together?)
Clustering looks at similarity based on object attributes, co-occurrence grouping looks at
similarity based on appearing together in transactions! (people who bought X, also bought Y).
6. Profiling attempts to characterize the typical behaviour of an individual, group or population
(what is the typical cell phone usage of this customer segment?). Often used to create a
standard for ‘normal’ behaviour, and then detect fraud or intrusion.
7. Link prediction attempts to predict connections between data items, usually by suggesting a
link should exist, and possibly estimating the strength of the link (used recommending
movies or in social networking systems; you and Bob share 10 friends, perhaps you like to be
friends?).
8. Data reduction attempts to take a large set of data and replace it with a smaller set
containing much of the important information in the larger set (easier to deal with or
process). It may better reveal the information (massive dataset on consumer movie
preference -> genre preference). Loss of information vs improved insight.
9. Causal modelling attempts to help understand what events/actions influence others (used to
target advertisements to customers). Techniques include randomized controlled experiments
(A/B tests) and sophisticated models for drawing causal conclusions from observational data.
The experimental and observational methods both are counterfactual analysis; attemps to
understand what would be the difference between the situations. Assumptions that must be
made in order for the causal conclusion to hold, must always be mentioned!
Focus is on classification, regression, similarity matching & clustering.