100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Complete summary strategy analytics book & lecture €8,48
In winkelwagen

Samenvatting

Complete summary strategy analytics book & lecture

4 beoordelingen
 254 keer bekeken  23 keer verkocht
  • Vak
  • Instelling
  • Boek

This summary comprises all subjects for the course strategy analytics, containing both summary of the book and lecture notes.

Voorbeeld 8 van de 64  pagina's

  • Ja
  • 13 december 2020
  • 64
  • 2020/2021
  • Samenvatting

4  beoordelingen

review-writer-avatar

Door: cindy_daniel5 • 2 jaar geleden

relevante stof en netjes genoteerd

review-writer-avatar

Door: wishaanmanichand • 2 jaar geleden

review-writer-avatar

Door: raymonvanboheemen • 4 jaar geleden

review-writer-avatar

Door: jescodekeijzer • 4 jaar geleden

avatar-seller
Strategy analytics

Table of Contents
Strategy analytics...............................................................................................................1
Lecture 1: Introduction........................................................................................................5
Chapter 1: Data analytics thinking......................................................................................6
Fundamental concepts: data science and data driven decision making (DDD)..............................6
Fundamental concepts: Big data...................................................................................................6
Fundamental concepts: data science, data mining, and machine learning....................................7
Business problems.........................................................................................................................8
From business problems to data mining tasks...............................................................................8
Data mining and its results............................................................................................................9
Data mining process – CRISP.........................................................................................................9
Boundaries of data mining..........................................................................................................10
Case: Capital one.........................................................................................................................10
Lecture 2: Supervised segmentation..................................................................................10
Before we start, a warning on terminology…..............................................................................10
Chapter 3: predictive modeling: from correlation to supervised segmentation..................11
Models, induction & deduction...................................................................................................11
Supervised segmentation: an explanation...................................................................................11
Access to historical data..............................................................................................................12
Data mining question..................................................................................................................12
Let’s look at some data….............................................................................................................12
A peek into the dataset…............................................................................................................12
Visualize - scatter plot.................................................................................................................12
Step by step approach.................................................................................................................12
Entropy........................................................................................................................................12
Entropy (H)..................................................................................................................................13
Information gain (IG)...................................................................................................................13
Formally definition of IG..............................................................................................................13
Illustration IG..............................................................................................................................14
Iterative process to find maximum IG..........................................................................................14
Classification tree: explanation...................................................................................................14
Classification tree: development and advantages.......................................................................15
Decision boundaries....................................................................................................................16


1

, Predicting the end of the coronavirus pandemic – optimistic or not?..........................................16
Chapter 4: Fitting a model to data.....................................................................................16
Classification via mathematical functions....................................................................................16
Linear discriminant models.........................................................................................................16
Logistic regression.......................................................................................................................17
Estimating probabilities...............................................................................................................17
Probability estimation and Laplace correction............................................................................17
Probability estimation and Laplace correction............................................................................18
Linear discriminant functions (the book has errors on this).........................................................18
Support vector machine (SVM)....................................................................................................19
Support Vector Machine (SVM) - Loss function...........................................................................19
Support vector machine (SVM) method......................................................................................19
Linear regression.........................................................................................................................20
Logistic regression.......................................................................................................................20
Logistic regression vs. SVM..........................................................................................................20
Comparison between logistic regression (function fitting) vs decision tree (tree induction)........21
Case: Classification trees and decision-analytic feedforward control: a case study from the video
game industry (Brydon & Gemino, 2008)....................................................................................21
Lecture 3: FIFA world cup & Easyjet cases..........................................................................21
Recap from last week..................................................................................................................22
Chapter 5: Overfitting and its avoidance............................................................................22
Chapter 5.1: Overfitting: example...............................................................................................22
Overfitting: Explanation..............................................................................................................23
Overfitting: bias-variance tradeoff...............................................................................................24
Overfitting: complexity and prediction error...............................................................................24
Overfitting: logistic regression vs. SVM.......................................................................................25
Chapter 5.2: Avoiding overfitting: how to measure generalizability............................................25
Avoiding overfitting: cross-validation..........................................................................................26
Learning curves...........................................................................................................................26
Avoiding overfitting: methods for classification trees..................................................................26
Avoiding overfitting: classification tree ensemble methods........................................................27
Avoiding overfitting: bagging.......................................................................................................28
Avoiding overfitting: boosting.....................................................................................................28
Avoiding overfitting: random forest............................................................................................28
Avoiding overfitting: random forest application..........................................................................29
Avoiding overfitting: logistic regression.......................................................................................29

2

, General method for avoiding overfitting – nested hold-out / cross-validation............................29
Case: FIFA world cup 2018...........................................................................................................30
Discussion questions...................................................................................................................30
Chapter 6: similarity, neighbors and clusters.....................................................................30
Chapter 6.1 Similarity and distance.............................................................................................30
Similarity: motivation..................................................................................................................30
Distance: measures (definitions).................................................................................................31
Distance: example I.....................................................................................................................31
Distance: example II....................................................................................................................32
Chapter 6.2 Nearest neighbors: example on US voters................................................................32
Nearest neighbors: technique.....................................................................................................32
Nearest neighbors: influence of k................................................................................................33
Nearest neighbors: challenges.....................................................................................................33
Chapter 6.3 Clustering.................................................................................................................34
Clustering: definition...................................................................................................................34
Hierarchical clustering: method...................................................................................................34
Hierarchical clustering: visualization...........................................................................................34
K-means clustering: method........................................................................................................34
K-means clustering: specifications...............................................................................................35
Case: Generation Easyjet.............................................................................................................35
Learning goals lecture 3...............................................................................................................35
Lecture 4: Management: evaluating & visualizing the performance of analytical strategies
.........................................................................................................................................36
Chapter 7: decision-analytic thinking – model performance metrics..................................36
Classifier Accuracy.......................................................................................................................36
Confusion matrix.........................................................................................................................36
Unbalanced classes.....................................................................................................................37
Unbalanced classes- problems.....................................................................................................37
Problems with unequal costs and benefits..................................................................................38
Model performance metrics: metrics..........................................................................................38
Class discussion...........................................................................................................................39
Decision-analytic thinking – expected value framework..............................................................39
Expected value for classifier use..................................................................................................39
Expected value for classifier evaluation - probabilities................................................................40
Expected value for classifier evaluation – sample not random/representative, class priors known
....................................................................................................................................................40


3

, Expected value for classifier evaluation – costs and benefits.......................................................41
Expected value framework: comparison.....................................................................................41
Chapter 8: Visualizing model performance........................................................................41
Classification vs. Ranking.............................................................................................................41
Visualizations..............................................................................................................................42
Visualizations: ROC graphs..........................................................................................................42
Visualizations: cumulative response curves.................................................................................43
Visualizations: Lift curve..............................................................................................................43
Model evaluation........................................................................................................................44
Case: predicting healthcare needs...............................................................................................44
Chapter 11: Complex decision-analytic thinking................................................................45
Example: targeting the best prospects for a charity mailing........................................................45
Lecture 5: methods: Bayesian, text mining, co-occurrence, profiling..................................46
Learning goals.............................................................................................................................46
Chapter 9: Naïve Bayes Classifier.......................................................................................46
9. 1 – example on cancer screening.............................................................................................46
9.1 Probabilities & Bayes’ rule.....................................................................................................47
9.1 Back to our example..............................................................................................................47
9.1 Advancing Bayes’ rule............................................................................................................47
9.1 A simplified example.............................................................................................................48
9.1 Lift.........................................................................................................................................48
9.2 Conditional probabilities in practice......................................................................................48
9.2 Benefits of Naive Bayes.........................................................................................................49
9.2 Disadvantage of Naïve Bayes.................................................................................................49
Chapter 10: Text analysis...................................................................................................49
10.1 Text as data.........................................................................................................................49
10.1 Text analysis........................................................................................................................49
10.2 ‘Bag of words’ approach......................................................................................................50
10.2 Bag of words example I........................................................................................................50
10.3 Advanced methods..............................................................................................................52
10.3 Advanced methods: topic models........................................................................................53
Case: Twitter and stock returns...................................................................................................53
Chapter 12: Co-occurrence, associations, profiling, link prediction, latent dimensions.......53
12.1 Co-occurrence and association rules....................................................................................53
12.1 Co-occurrence measure comparison....................................................................................54


4

, 12.1 Co-occurrence measures visualization.................................................................................54
12.1 Examples.............................................................................................................................54
12.2 Profiling and link prediction.................................................................................................55
12.3 Data reduction and latent dimensions.................................................................................55
Bias, variance, and ensemble methods........................................................................................56
Causal explanation......................................................................................................................56
Lecture 6...........................................................................................................................56
Learning goals.............................................................................................................................56
Chapter 13: Data science and business strategy................................................................57
Important factors to get the most from your data.......................................................................57
Thinking data analytically & creating a conducive culture...........................................................57
Achieving competitive advantage with data science...................................................................57
Sustaining competitive advantage with data science..................................................................57
Attracting data scientists.............................................................................................................58
Proposal evaluation for data science projects.............................................................................58
Questions to ask..........................................................................................................................58
An example for data mining........................................................................................................58
Evaluation step 1: Business understanding..................................................................................59
Evaluation step 2: data understanding / data preparation..........................................................59
Evaluation step 3: Modeling........................................................................................................59
Evaluation step 4: evaluation......................................................................................................60
Evaluation step 5: Deployment....................................................................................................60
Chapter 14: Conclusion......................................................................................................60
What data can’t do: Humans in the loop.....................................................................................61
Privacy, ethics and mining data about individuals.......................................................................62
Case: The Dark Side of Customer Analytics (Davenport & Harris, 2007 HBR)...............................62
Discussion questions...................................................................................................................62
Review..............................................................................................................................62


Lecture 1: Introduction
- Chapter 1 + 2; Case: Capital 1
Data science: set of fundamental principles that guide the extraction of knowledge from
data. Data mining: extraction of knowledge from data, via technologies that incorporate
these principles.




5

,Chapter 1: Data analytics thinking
The ubiquity of data opportunities in the digital era
Over last 25 years, many devices are all be linked with each other through data. The costs of
storing these data have decreased.

Some observations in our daily life
- Marketing
o Online advertising
o Recommendations for cross selling
o Customer relationship management
- Finance
o Credit scoring and trading
o Fraud detection
o Workforce management
- Retail
o Marketing
o Supply chain management

Data is used in many organizations daily. Different technologies  more data  use for
better decisions.

Fundamental concepts: data science and data driven decision
making (DDD)
Data-driven decision making (DDD): practice of basing decisions on the analysis of data,
rather than purely on intuition.
- Relying on data and analysis. DDD always assumes there is a lot of data, which is not
always the case. In e.g. pandemic, we learn as we go. Often in initial stages, there
hasn’t been a lot of data.

Data science: Involves principles, processes and techniques for understanding phenomena
via the (automated) analysis of data.
- E.g.: What can we do to retain customers? Predict customer churn.

Data science supports DDD but is also overlapping with DDD. Business decisions are
increasingly being made automatically. Data engineering includes data science but is useful
for much more.

The sort of decisions of interest:
1. Decisions which need discovery (non-obvious) within data
2. Repetitive decisions (especially at massive scale)
The type of decisions that are interesting for a company require packages that are not
obvious, they need more discovery, it’s not intuitive. The other important element are the
repetitive decisions. If there is a problem you frequently challenge, data science is
important.

Fundamental concepts: Big data
Data vs. information
6

,Data can invert into information, so that they have meaning. Data in itself has no meaning.

Big data: Simple, very large dataset, but with three distinct characteristics (3Vs):
- Volume: quantity of generated and stored data
- Variety: type and nature of data
- Velocity: Speed at which the data is generated and processed
Big Data 1.0 is transformed into Big Data 2.0 (social networking components, rise of voice of
individual consumer).

Fundamental concepts: data science, data mining, and machine
learning
Data science: involves principles, processes. XXX
- Involves storage, collection, analysis and implementation.
Data mining: Extraction of knowledge from data, via technologies that incorporate these
principles. Data mining is one aspect of data science, extracting the knowledge.

We focus on the 1) business understanding (how do you translate business problems into
data problems) and 2) data analysis (what kind of models can you use; understanding data
analysis).

Data analytics: Process of examining datasets in order to draw conclusions about the useful
information they may contain. What value do these models have? What is the framework?
- How much value does it create for a manager and how can you use it for the future?

Types of data analysis:
- Descriptive analysis (BI): what has happened?
o Simple descriptive statistics, dashboard, charts, diagrams.
- Predictive analysis: what could happen? (main focus of this course)
o Segmentation, regression
- Prescriptive analysis: what should we do?
o Complex models for product planning and stock optimization

Big data analysis as a strategic asset – video
Data comes from both internal and external sources, structured and unstructured. The idea
is to make meaning of the data. When you combine big data with analysis, this is key for
having competitive advantage.

Strategic asset: data and the capability to extract useful knowledge from data can be
strategic asset. One has to think of data science as a strategic asset you invest in. You need
good people, good infrastructure and a process over years. It’s like R&D, long term project
which pays off in the long run.

Fundamental concepts
Classification Predict - for each individual in a population - which of a set of classes this
individual belongs to. It predicts whether something will happen. Often a
binary target (categorical, not numerical as in regression).
Scoring Class probability estimation: applying score, representing the probability

7

, that an individual belongs to each class.
Regression predict, for each individual, the numerical value of some variable for that
individual. It predicts how much something will happen.
Similarity attempts to identify similar individuals based on data known about them.
matching Often used for product recommendations.
Clustering Group individuals in a population together in by their similarity, but not
driven by any specific purpose. Useful in preliminary domain explorations
to see which natural groups exist.
Co- Frequent itemset mining, association rule discovery and market-basket
occurrence analysis: to find associations between entities based on transactions
grouping involving them. It considers similarity of objects based on their appearing
together in transactions (instead of the objects’ attributes).
Profiling Behavior description: characterize typical behaviour of individual/group/
population. Useful for establishing behavioural norms for anomaly
detection applications such as fraud systems.
Link Predict connections between data items, usually by suggesting that a link
prediction should exist, and possibly also estimating the strength of the link. Often
used in social networking systems.
Data Replace a large set of data into a smaller set, which may be easier to deal
reduction with but contains much of the important information.
Causal Understand what events/actions actually influence others. Understand the
modelling difference between two situations (treatment event vs no treatment).

Business problems
From business problems to data mining tasks
A collaborative problem solving between business stakeholders and data scientists:
- Decomposing a business problem into (solvable) subtasks
- Matching subtasks with known tasks for which tools are available
- Solving the remaining non-matched subtasks by creativity
- Putting the subtasks together to solve the overall problem

Is there any pattern among the customers? Is there something common? To answer that,
you need data on customers. You need to get data, understand the models you want to
generate and run them. Choosing the right tools for the subtasks is important. There is a lot
of subjective evaluation (human evaluation) in this process. Your prior knowledge of the
domain becomes crucial.


Typology of methods
The key question: Is there a specific target variable?
- Yes  Supervised learning
- No  Unsupervised learning – see if there are certain clusters of customers different
from others?

Unsupervised learning = clustering, co-occurrence grouping, profiling
- Training data provides examples – no specific outcomes
- The machine tries to find specific patterns in the data
8

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

√  	Verzekerd van kwaliteit door reviews

√ Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper mathildeverbeek. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €8,48. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 53022 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€8,48  23x  verkocht
  • (4)
In winkelwagen
Toegevoegd