Strategy Analytics
Knowledge clips
Week 1
Data Science Fundamentals
Data observations present in daily life;
1) Marketing
- Online advertising
- Recommendations for cross-selling
- Customer relationship management
2) Finance
- Credit scoring and trading
- Fraud detection
- Workforce management
3) Retail
- Marketing
- Supply chain management
Data-driven decision making (DDD) = refers to the practice of basing
decisions on the analysis of data, rather than purely on intuition. Very
useful tool that can be used to drive managerial decisions -> need to
triangulate different forms of data and managerial experience to make
decisions
Data science = involves principles, processes and techniques for
understanding phenomena via the (automated) analysis of data. To
address specific question, it is the engineering behind the logics
The sort of decisions of interest is the need for discovery (non-obvious), this is something
counterintuitive and the repetitive decisions, you must be able to use it in other situations
Data = facts and figures (not the information itself), when this is structured it will provide information
(this is the context of the data)
Big data = very large dataset, with 3 distance
characteristics -> the 3Vs
1. Volume = quantity of generated stored data
2. Variety = type of nature of data
3. Velocity = speed at which the data is generated
and processed
Data mining = the extraction of knowledge from data,
via technologies that incorporate these principles. You
use it for new data
Types of data analysis;
- Descriptive analytics (BI) = what has happened? -> simple descriptive statistics, dashboard,
charts, diagrams. Does not provide why it happened, or why it needs to change
- Predictive analytics = what could happen? -> segmentation, regressions. It provides information
on what the influence is
- Prescriptive analytics = what should we do? -> complex models for product planning and stock
optimization
, Data Science Capability as strategic asset
Data science is viewed as a capability, which is a strategic asset. Data and the capability to extract
useful knowledge from data can be a strategic asset. It provides competitive advantage
Big data refers to the big volume of information that companies can gather and have access to.
Information flows come from customer, suppliers and distributors. All this information has to be
structured. When you combine big data with effective analytics you have a key CA for organisations
Delta model =
1) D = data, this data must be clean, accessible and unique
2) E = enterprise wide focus, must be available for the entire organisation
3) L = leaders, leaders at all levels that promote data driven culture
4) T = targets, key business areas
5) A = analysists, that can accomplish the strategy
Business problem to data mining tasks
A collaborative problem-solving between business stakeholders and data scientists -> decomposing a
business problem into solvable subtasks. Match the subtasks with known tasks for which tools are
available. Solving the remaining non-matched subtasks (by creativity). Putting the subtasks together
to solve the overall problem
If you want to find out who is your most profitable customer, you should break it into several
tasks = who are my customers, how can I segment them in profiles, are there differences in
characteristics that result in different revenue flows
Methods that can be used;
- Classification
- Regression
- Similarity matching
- Link prediction
- Clustering
- Profiling
- Co-occurrence grouping
- Data reduction
- Causal modeling
Supervised vs. Unsupervised
The methods can be either supervised or unsupervised learning methods. Supervised methods are
those where you are looking for something, unsupervised do not have a target variable. Target
variable can be seen as a dependent variable
Unsupervised learning, do not have specific outcomes. The machine tries to find specific patterns in
the data, it provides examples. Algorithm that is used in this type of learning;
- Clusters
- Anomaly detection
- Association discovery
- Topic modeling
Because this type of model does not have a ‘outcome’ these unsupervised learning methods cannot
be evaluated.