Lesson 1 – Introduction
Data-driven Business decisions:
Decision making is a large and important part of any business:
• State of market: the right course of action
• Strategical: Finding investment strategy
• Tactical: Determining target market share
• Operational: Finding the target customers
Decision analysis = a structured way of dealing with decision problems. How to make and answer questions?
Example 1 – Customer churn at Vodafone:
When a contract a subscription at a telecom provider finishes, the customer is able to change companies. The
switch itself is called Customer Churn. Because attracting a new customer is more expensive than retaining
one, it is attractive for the company to influence the customer churn.
What information will be important for Vodafone in order to know and analyze whether a customer will
change companies or not?
Based on this information, the company can make advertisements specially for you in order to keep you with
the company.
Example 2 – Employee hiring:
How would you select a new employee? E.g. a new department is opening soon and you need to hire a new
manager. You Announce the open job position ant then you need to decide who fits best the open job position.
How does this decision differ from the decision taking in the example 1? Think about the available data and
decision goals.
In this case you have applicants. Not a lot of data is available like it is at the case of Vodafone. In a CV you
provide information which you think is important, this differs from every applicant in how they provide their
data.
Differences between the two examples:
• Amount of available data
• Type, source, quality of data
• Amount and type of uncertainty
• Number of stakeholders
• Number of goals
• Number of decision moments
STRUCTURED DECISION MAKING
Data science: a set of fundamental principles that guide the extraction of knowledge from data. When an
environment is data rich.
Data mining: is the extraction of knowledge from data, via technologies that incorporate these principles.
Decision science: When an environment is data poor.
Data science in context to data engineering and data driven decision making (DDD).
,The more DDD a company is using the more productive that company is. It could increase sales 4% - 6%.
Data science is a very populartype of science. Every big company needs data science in order to make
decisions. The widest use for data science is in marketing for tasks, such as: targeted marketing, online
advertising and recommendations for cross-selling.
BUSINESS DRIVERS
Nowadays many decisions must be automated due to:
• Large volumes of data
• Availability of online data, which requires real-time processing and decision making
E.g. when opening a webpage, in milliseconds an auction is running and gives every user a
personalized price. This is called real time binding, or real time decision making.
• Developments in m- and e-business: decisions anywhere, anytime
• Competitive advantage through fast processing
• Improving the efficiency of business processes
DATA SCIENCE CAPABILITY
Companies invest quiet a lot of money in data science for a strategic asset. You need to collect the data and
store it. This costs a lot of money. Also having good quality of data costs a lot of money. You need to hire a lot
of people in order to keep the data in good quality. Also, companies need to invest in knowledge like
algorithms, skilled employees, etc.
DATA ANALYTIC THINKING
Employees throughout all departments should interact with the data science team in order to fully understand
the importance of that team and understand what is really going in on the whole company. This lack of
understanding is much more damaging in data science projects than in other technical projects, because the
data science is supporting improved decision making.
Data science is a mixed discipline:
,Data engineering and data processing is not related to data science, but are very close to it.
Every day use of data science:
How can this hurricane be related to data science? Learn from the past and predict future hurricanes.
, Levels of analytics capability:
If you don’t understand the concepts, go to page 14 and 15.
Data science fundamental concept 1:
Data has hidden solutions inside it. You have to extract it from the data.
CRISP-DM = Cross Industry Standard Process for Data Mining
Data science fundamental concept 2:
Finding informative statistical attributes:
• there may be many attributes in a large mass of data
• find the useful ones giving useful information about likelihood of their churning
• usually said that “correlated” with the churn
Data science fundamental concept 3:
Generalizing beyond the available data:
• looking not too hard to the current data
• tightly fitting all our results: overfitting
• avoiding overfitting: robust, hence stronger
Data science fundamental concept 4:
Dependency to context:
• formulating useful question
• highly dependent on the context
• churning, employee hiring...
• evaluating alternative methods/different results
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
Artificial Intelligence (AI) = any technology that enables a system to demonstrate human-like intelligence.
E.g. Deep Blue challenged Garry Kasparov with playing chess
Machine Learning (ML) = a training mechanism by which computers are able to learn and adapt themselves to
various situations.
Data science uses tools from many disciplines, including ML hence AI, to solve business problems. So AI is a
mixture of ML and Data science.