Business intelligence
Chapter 1: Data-analytic thinking
The omnipresence of data opportunities
Companies in almost every industry are focused on gaining data for
competitive advantage
Everything in the past was manually, nowadays computers have become
more powerful, networking has become ubiquitous and algorithms have
developed
Probably the widest applications of data-mining techniques are in
marketing
Data science= a set of fundamental principles that guide the extraction
of knowledge from data
Data mining= the extraction of knowledge from data
Data science, engineering and data-driven decision making
Data science involves principles, processes and techniques for
understanding phenomena via the analysis of data
Ultimate goal: improve decision making
Data-driven decision-making (DDD); refers to basing decisions on the
analysis of data
Research showed that statistically, the more data-driven a firm is, the
more productive it is
Sort of decisions in this book:
- Decisions for which ‘discoveries’ need to be made within data
- Decisions that repeat (especially at massive scale)
Data processing and ‘Big data’
Data engineering and processing are critical to support data science, but
they are more general
the big difference is that they support data science
Data processing technologies= very important for many data-oriented
business tasks that do not involve extracting knowledge or data-driven
decision-making
Big data= datasets that are too large for traditional data processing
systems
often require new technologies, they are being used for many tasks
From Big data 1.0 to Big Data 2.0
First we had Web 1.0, businesses were busy getting basic internet
technologies in place
Together with Web 1.0 we had Big Data 1.0, firms are building the
capabilities to process large data
In Web 2.0 there were new systems and companies began taking
advantage of the interactive nature of the web
Together with Web 2.0 Big Data 2.0 follows, firms became capable of
processing massive data in a flexible fashion
,Data and data science capability as a Strategic Asset
Data and the capability to extract useful knowledge from data, should be
regarded as key strategic assets
Often we don’t have exactly the right data to make the best decisions
and/ or the right talent to best support making decisions from the data
the right data often cannot substantially improve decisions without
suitable data science talent, this needs investment!
Studies giving clear quantitative demonstrations of the value of a data
asset are hard to find, primarily because firms are hesitant to divulge
results of strategic value
Exception: study by Martens and Provost (onderzoek naar of een
transactie het aangeboden offer kan beïnvloeden)
The idea of data as a strategic asset is certainly not limited
Data- analytic thinking
When faced with a business problem, you should be able to assess
whether and how data can improve performance
Firms in many traditional industries are exploiting new and existing data
resources for competitive advantage
Data analytics projects reach into all business units, employees
throughout these units must interact with the data science team
, Chapter 2: Business problems and Data Science solutions
Data mining is a process with fairly well-understood stages
Some involve application of information technology, others require an
analyst’s creativity
From business problems to Data Mining Tasks
Each problem is unique, but there are sets of common tasks that underlie
the business problems
A business problem is divided in subtasks, some are unique, some are
common data mining tasks
1) Classifications and class probability estimation
Attempt to predict, for each individual in a population, which of a set
of classes this individual belongs to
Classes are mutually exclusive
For a classification task, a data mining procedure produces a model
that, given a new individual, determines which class that individual
belongs to
Classification and scoring are very closely related
2) Regression (value estimation)
Attempts to estimate or predict, for each individual the numerical
value of some variable for that individual
A regression procedure produces a model that, given an individual,
estimates the value of the particular variable specific to the
individual
Regression is related to classification, but the two are different
3) Similarity matching
Attempts to identify similar individuals based on data known about
them
Can be used directly to find similar entities and is the basis for one
of the most popular methods for making product recommendations
4) Clustering
Attempts to group individuals in a population together by their
similarity, not for any specific purpose
Clustering is useful in preliminary domain exploration to see which
natural groups exist because these groups in turn may suggest
other data mining tasks or approaches
5) Co-occurrence grouping
Attempts to find associations between entities based on
transactions involving them
Co-occurrence grouping considers similarity of objects based on
their appearing together in transactions
Result: description of items that occur together
6) Profiling