100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Data Science for Business what you need to know about data mining and data-analytic thinking - Foster Provost Tom Fawcett - Summary $10.91
Add to cart

Summary

Data Science for Business what you need to know about data mining and data-analytic thinking - Foster Provost Tom Fawcett - Summary

16 reviews
 1877 views  95 purchases
  • Course
  • Institution
  • Book

A clear and elaborate summary of the Data Science for Business "What You Need to Know About Data Mining and Data-Analytic Thinking" by Foster Provost & Tom Fawcett. Extra chapters include Neural Networks, a Formula sheet and example exam questions. Chapters included are: 1, 2, 3, 4, 5, 6, 7, 8, 9, ...

[Show more]
Last document update: 7 year ago

Preview 5 out of 36  pages

  • Yes
  • November 1, 2017
  • December 16, 2017
  • 36
  • 2017/2018
  • Summary

16  reviews

review-writer-avatar

By: taliaabdool • 2 year ago

review-writer-avatar

By: timofeindor • 3 year ago

review-writer-avatar

By: yasarammar • 3 year ago

reply-writer-avatar

By: martijnpaulussen • 3 year ago

Hi Yasarammar, sorry to see you gave 1 star, was there anything that could be improved?

review-writer-avatar

By: nickgritter • 3 year ago

review-writer-avatar

By: noasara • 4 year ago

reply-writer-avatar

By: martijnpaulussen • 4 year ago

Hi Noasara, thank you for your review, I hope it helps you with learning :).

review-writer-avatar

By: raymonvanboheemen • 4 year ago

review-writer-avatar

By: gyoo2 • 4 year ago

reply-writer-avatar

By: martijnpaulussen • 4 year ago

Thank you for your review Gyoo2!

Show more reviews  
avatar-seller
Data Science for Business Summary
What You Need to Know About Data Mining and Data-Analytic Thinking
Foster Provost & Tom Fawcett




Author: Martijn C. Paulussen
University: Maastricht University School of Business and Economics
Master: MSc Business Intelligence & Smart Services
Course: [EBC4220] Business Analytics




School of Business and Economics
MSc Business Intelligence & Smart Services


© 2017 Martijn Paulussen - Maastricht University School of Business and Economics

Nothing in this publication may be reproduced and/or made public by means of printing, offset, photocopy or
microfilm or in any digital, electronic, optical or any other form without the prior written permission of the
owner of the copyright.

,Table of Contents
Chapter 0 - Introduction and general insights ................................................................................................3
Chapter 1 - Data Analytic Thinking ...............................................................................................................3
1.1. Data Driven Decision making (DDD) .................................................................................................3
Chapter 2 –Data Mining Tasks and Business Problems .................................................................................4
2.1. Supervised vs Unsupervised ................................................................................................................5
2.2. Data mining and KDD .........................................................................................................................5
Chapter 3 - Predictive Modeling: Correlation to Supervised Segmentation ..................................................6
3.1. Entropy ................................................................................................................................................6
3.2. Information Gain .................................................................................................................................8
3.3. Entropy Chart ......................................................................................................................................9
3.4. Supervised Segmentation with Tree-Structured Models .....................................................................9
Chapter 4 - Fitting a Model to Data..............................................................................................................11
4.1. Support Vector Machines ..................................................................................................................11
4.2. Logistic Regression ...........................................................................................................................12
Chapter 5 - Overfitting and its Avoidance....................................................................................................13
5.1. Holdout Data .....................................................................................................................................13
5.2. Cross-validation.................................................................................................................................14
5.3. Learning Curve ..................................................................................................................................14
Chapter 6 - Similarity, Neighbors, and Clusters ...........................................................................................15
6.1. General Euclidean Distance ..............................................................................................................15
6.2. Nearest Neighbor ...............................................................................................................................15
6.3. Clustering ..........................................................................................................................................17
6.4. Centroids Clustering ..........................................................................................................................17
Chapter 7 - Decision Analytic Thinking 1: What is a Good Model? ...........................................................19
7.1. Confusion Matrix...............................................................................................................................19
7.2. Expected Value (Profit) .....................................................................................................................19
7.3. Sensitivity, Specificity and Accuracy................................................................................................21
7.4. Baseline Methods ..............................................................................................................................21
Chapter 8 - Visualizing Model Performance ................................................................................................22
8.1. Profit Curve .......................................................................................................................................22
8.2. ROC graphs and curves .....................................................................................................................23
8.2.1. Area Under the ROC Curve (AUC) ...........................................................................................23
8.3. Cumulative response curves ..............................................................................................................24
8.4. Lift curves..........................................................................................................................................24
8.5. Example: Performance Analytics for Churn Modeling .....................................................................24
8.5.1. Fitting Curve ...............................................................................................................................24
Chapter 9 - Evidence and Probabilities ........................................................................................................25


1

, 9.1. Joint Probabilities and Independence ................................................................................................25
9.2. Naïve Bayes .......................................................................................................................................25
9.3. Evidence Lift .....................................................................................................................................26
Chapter 11 - Decision Analytics Thinking 2: Toward Analytical Engineering ...........................................27
11.1. Expected Value................................................................................................................................27
Chapter 13 - Data Science and Business Strategy ........................................................................................28
Chapter Extra: Fuzzy Systems ......................................................................................................................29
Crisp sets ..................................................................................................................................................29
Fuzzy sets .................................................................................................................................................29
Chapter Extra: Neural Networks ..................................................................................................................32
Chapter Extra: Formula Sheet ......................................................................................................................34
Chapter Exam Question................................................................................................................................35




2

,Chapter 0 - Introduction and general insights
Big Data: Datasets that are too large for traditional data processing systems.
Web 2.0: where new systems and companies began taking advantage of the interactive nature of the Web.



Chapter 1 - Data Analytic Thinking
1.1. Data Driven Decision making (DDD)
Data science involves principles, processes, and techniques for
understanding phenomena via the (automated) analysis of data in
order to improve decision making.
Data-driven decision-making [DDD] refers to the practice of basing
decisions on the analysis of data, rather than purely on intuition.
Two decision types: (1) decisions for which “discoveries” need to be
made within data, and (2) decisions that repeat, especially at massive
scale, and so decision-making can benefit from even small increases in
decision-making accuracy based on data analysis.
Fundamental concept 1: Extracting useful knowledge from data to
solve business problems can be treated systematically by following a
process with reasonably well-defined stages. (e.g. CRISP-DM)
Fundamental concept 2: From a large mass of data, information
technology can be used to find informative descriptive attributes of Figure 1-1. Data science in the
entities of interest. context of various data-related
processes in the organization.
Fundamental concept 3: If you look too hard at a set of data, you
will find something—but it might not generalize beyond the data you’re looking at.
Fundamental concept 4: Formulating data mining solutions and evaluating the results involves thinking
carefully about the context in which they will be used.




3

, Chapter 2 –Data Mining Tasks and Business Problems
Fundamental concepts: A set of canonical data mining tasks; The data mining process; Supervised
versus unsupervised data mining.
1. Classification & Probability estimation: Predict, for each individual in a population, which of a
(small) set of classes this individual belongs to. (Will it happen?)
Q: Among all the customers of TelCo, which are likely to respond to a given offer?
A: e.g. two classes: will respond and will not respond. Yes or No.
Class probability estimation: a score representing the probability (or some other quantification of
likelihood) that that individual belongs to each class.
2. Regression (value estimation): Estimate or predict, for each individual, the numerical value of some
variable for that individual. (How much?)
Q: How much will a given customer use the service?
A: predict service usage (target). Model can be based on other similar individuals (variables).
3. Similarity matching: Identify similar individuals based on data known about them.
Similarity matching can be used directly to find similar entities.
4. Clustering: to group individuals in a population together by their similarity, but not driven by any
specific purpose. (Chapter 6)
Q: Do our customers form natural groups or segments? What products should we offer or
develop? How should our customer care teams (or sales teams) be structured?
Clustering is useful in preliminary domain exploration to see which natural groups exist
5. Co-occurrence grouping: Find associations between entities based on transactions involving them.
Q: What items are commonly purchased together?
While clustering looks at similarity between objects based on the objects’ attributes, co-occurrence
grouping considers similarity of objects based on their appearing together in transactions. “People
who bought X also bought Y”
6. Profiling: attempts to characterize the typical behavior of an individual, group, or population.
Q: What is the typical cell phone usage (day, night, international) of this customer segment?
Profiling is often used to establish behavioral norms for anomaly detection applications such as
fraud detection and monitoring for intrusions to computer systems
7. Link prediction: Predict connections between data items, usually by suggesting that a link should
exist, and possibly also estimating the strength of the link.
Q: Since you and Karen share 10 friends, maybe you’d like to be Karen’s friend?
Link prediction can also estimate the strength of a link. We search for links that do not exist
between customers and movies, but that we predict should exist and should be strong.
8. Data reduction: Take a large set of data and replace it with a smaller set of data that contains much of
the important information in the larger set.
Data reduction usually involves loss of information. What is important is the trade-off for
improved insight.
9. Causal modeling: Helps us understand what events or actions actually influence others.


4

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller martijnpaulussen. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $10.91. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

51683 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 15 years now

Start selling
$10.91  95x  sold
  • (16)
Add to cart
Added