Samenvatting

Summary course Strategy Analytics (Grade Assignments 9)

16 keer verkocht

Instelling
Universiteit Van Amsterdam (UvA)

Boek
Data Science for Business

Complete summary of: - Book: Data Science for Business (Provost & Fawcett) - Case studies summary and answers (P. Snoeren) All exam materials needed next to the lecture slides!

[Meer zien]

Voorbeeld 8 van de 39 pagina's

Bekijk voorbeeld

Heel boek samengevat? Ja
Geupload op 7 april 2021
Aantal pagina's 39
Geschreven in 2021/2022
Type Samenvatting

€10,24

In winkelwagen

Opslaan

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Strategy Analytics Summary
Data Science for Business Book, Provost & Fawcett
Case Studies, P. Snoeren

Assignment 1 grade: 8.5
Assignment 2 grade: 10

Sophie van Sonsbeek
MSc Business Administration
University of Amsterdam
6314M0380Y
March 23, 2021

Sophie van Sonsbeek - 12799955

,Table of content
Chapter 1. Introduction: Data-Analytic Thinking ............................................................................................ 3

Chapter 2. Business Problems and Data Science Solutions ............................................................................. 5

Chapter 3. Introduction to Predictive Modeling: From Correlation to Supervised Segmentation .................... 7

Chapter 4. Fitting a Model to Data ................................................................................................................. 9

Chapter 5. Overfitting and its avoidance ...................................................................................................... 12

Chapter 6. Similarity, neighbors, and clusters .............................................................................................. 15

Chapter 7: Decision Analytic Thinking I: What is a Good Model? .................................................................. 18

Chapter 8: Visualizing model performance................................................................................................... 20

Chapter 9: Evidence and Probabilities .......................................................................................................... 24

Chapter 10: Representing and Mining Text .................................................................................................. 26

Chapter 11: Decision Analytic Thinking II: Toward Analytical Engineering .................................................... 29

Chapter 12: Other Data Science Tasks and Techniques................................................................................. 30

Chapter 13: Data Science and Business Strategy .......................................................................................... 32

Chapter 14: Conclusion ................................................................................................................................ 34

Cases ........................................................................................................................................................... 35
1. Capital one ............................................................................................................................................. 35
2. Gaming industry..................................................................................................................................... 35
3. Easyjet + Fifa .......................................................................................................................................... 36
Easyjet ........................................................................................................................................................ 36
Fifa .............................................................................................................................................................. 36
4. Google Healthcare ................................................................................................................................. 37
5. Twitter and stock returns ....................................................................................................................... 38
6. Privacy.................................................................................................................................................... 38

Sophie van Sonsbeek - 12799955

,Chapter 1. Introduction: Data-Analytic Thinking
Introduction Data collection is done in every aspect of business:
- Operations, manufacturing, supply-chain, customer behavior,
marketing campaign performance, workflow procedure, and
so on.

Data science = the availability of data increases interest in methods to
extract knowledge and information from data.
The ubiquity of Data mining techniques:
data - Marketing: targeted marketing, online advertising,
opportunities recommendations for cross-selling
- Finance: credit scoring, trading, fraud detection
- Retail: Amazon & Walmart applies throughout entire business

Data-analytic thinking enables you to evaluate proposals for data
mining projects.
This book Goal of this book:
Translate business problems into data problems.
Provide data mining/data science techniques.

Example used in the book: Predicting customer churn.
Customers switching from one company to another is called churn,
and it is expensive all around: one company must spend on incentives
to attract a customer while another company loses revenue when the
customer departs.
Data science Data science, engineering, and data-driven decision making
principles
Data driven decision making (DDD) refers to the practice of basing
decisions on the analysis of data, rather than purely on intuition.
Two types of decisions focused on in this book:
1. “Need discoveries”
2. “Repeated decisions”
And so, even a small increase in decision-making accuracy can have a
big impact.

Example:
Target wanted to jump on their competition: Amazon. They were
interested whether they could predict that people are expecting a
baby. If they could, they would gain an advantage by making offers
before their competitors.
If they could, they would gain an advantage by making offers before
their competitors.
à Pregnant mothers often change their diets, wardrobes, vitamin
etc.
Big data Data processing and “Big Data”
Difference between data science and data-driven business:

Sophie van Sonsbeek - 12799955

, • Data science needs data and benefits from data engineering
that are facilitated by data processing technologies. But
these techniques are not only for data science.
o Data processing technologies are important for data-
oriented business tasks that do not involve extracting
knowledge or data-driven decision making.
o E.g. online advertising campaign management,
modern web system processing
• Big data technologies:
o Big data = datasets that are too large for traditional
data processing systems require new processing
technologies.
o Big data technologies are used for implementing data
mining techniques à support data processing of data
mining techniques.
Strategic asset Data and data science capability as a strategic asset
Fundamental principle of data science: data, and the capability to
extract useful knowledge form data, should be regarded as key
strategic assets.

Sophie van Sonsbeek - 12799955

,Chapter 2. Business Problems and Data Science Solutions

Summary Fundamental concepts: A set of canonical data mining tasks; the data
mining process; supervised versus unsupervised data mining.

Understanding the whole data mining process helps to structure data
mining projects into systematic analyses.
Data mining From business problems to data mining tasks
techniques Data scientists decompose a business problem into sub tasks. The
data mining subtasks can then be composed to solve the overall
problem.

Data mining algorithms:
1. Classification and class probability estimation attempt to
predict, for each population, which of small set of classes this
individual belongs to.
Classification and scoring are very closely related; as we shall
see, a model that can do one can usually be modified to do
the other.
2. Regression (“value estimation”) attempts to estimate or
predict, for each individual, the numerical value of some
variable for that individual.
“How much will a given customer use the service?”
3. Similarity matching attempts to identify similar individuals
based on data known about them.
4. Clustering attempts to group individuals in a population
together by their similarity.
“Do our customers form natural groups or segments?”
5. Co-occurrence grouping attempts to find associations
between entities based on transactions involving them.
“What items are commonly purchased together?”
6. Profiling attempts to characterize the typical behavior of an
individual, group or population.
“What is the typical cell phone usage of this customer
segment?”
7. Link prediction attempts to predict connections between data
items.
“Since you and Karen share 10 friends, maybe you’d like to be
Karen’s friend?”
8. Data reduction attempts to take a large set of data and
replace it with a smaller set of data that contains much of the
important information in the larger set.
“GPA instead of list of grades per student”
9. Causal modeling attempts to help us understand what events
or actions actually influence others.

Sophie van Sonsbeek - 12799955

,Supervised versus Supervised learning = training data has a dependent variable or target
unsupervised variable.
methods - Purpose: predicting the target
- Problem: “will a customer leave when her contract expires?”
- Data mining techniques:
o Classification
§ Categorical (binary) target
§ “Which service package will a customer likely
purchase if given incentive I?
o Regression
§ Numeric target
§ “How much will this customer use the service?”
o Causal modeling
The data mining 1. Business understanding
process a. Recasting the problem & designing a solution is
iterative process of discovery.
2. Data understanding
a. It’s important to understand strengths & limitations of
the data because rarely there is an exact match with
the problem
3. Data preparation
a. Is the phase in which data are manipulated and
converted into forms that yield better results?
4. Modeling
a. Output of modeling: some sort of model or pattern
capturing regularities in the data valid & reliable
5. Evaluation
a. Are the data mining results valid & reliable?
6. Deployment
a. Getting return on investment by implementing the
results

Sophie van Sonsbeek - 12799955

,Chapter 3. Introduction to Predictive Modeling: From Correlation to
Supervised Segmentation
Summary Fundamental concepts: identifying informative attributes;
segmenting data by progressive attribute selection.
Exemplary techniques: finding correlations; attribute/variable
selection; tree induction

Predictive modeling: supervised segmentation – how can we segment
the population into groups that differ from each other with respect to
some quantity of interest.
Models, Predictive model = a formula for estimating the target.
induction and - Classification
prediction - Regression
Descriptive model = gain insight into the underlying phenomenon or
process.

Supervised learning = model describes a relationship between
independent variables and target variable.

Deductive vs inductive
Induction = generalizing from specific cases to general rules.
Inductive models:
- Classification and regression
Input data used for inducing the model à training data
Training data = are called labeled data because the value for the
target variable is known.
Supervised Selecting informative attributes
segmentation Classification
The groups need to be pure à homogeneous with respect to the
target variable.

The most common splitting criterion is called information gain, and it
is based on a purity measure called entropy.

Entropy = a measure of disorder (how mixed the segment is with
respect to the target variable).

P = probability for getting that element (p=1, all members of the set
have property x, p=0, no members of the set have property x)
Measure for group impurity
0=pure

1 = maximum impurity

Sophie van Sonsbeek - 12799955

, Information gain = the improvement in purity created by
segmentation. It combines segment size and segment purity.

Numeric variables
Numeric variables can be ‘discretized’ by choosing a split point (or
many split points) and then treating the result as a categorical
attribute.
Visualizing Classification tree
segmentations

Decision lines and hyperplanes
The lines separating the regions are known as decision lines.
Hyperplane is used in data mining literature to refer to the general
separating surface, whatever it may be.
La place Overfitting
correction La place correction moderates the influence of leaves with only a few
instances.

N = number of examples in the leaf belonging to class C
M = the number of examples not belonging to class C

Trees and sets of Before starting to build a classification tree with variables, it is worth
rules asking: how good are each of these variables individually?

For this we measure the information gain of each attribute, as
discussed earlier.
As can be seen, the first three variables – the house value, the
number of leftover minutes, and the number of long calls per month
– have a higher information gain than the rest.

Sophie van Sonsbeek - 12799955

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper sophievansonsbeek. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €10,24. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 64232 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Begin nu gratis

Samenvatting

Summary course Strategy Analytics (Grade Assignments 9)

Document informatie

Onderwerpen

Gekoppeld boek

Meer samenvattingen voor studieboek

Geschreven voor

Verkoper

Voorbeeld van de inhoud