Machine Learning (Data Mining) - Samenvatting (slides en handboek)
Full Summary of Chapters and Lecture Slides Data Science for Business
All for this textbook (25)
Written for
Tilburg University (UVT)
MSc. Strategic Management
Strategy Analytics
All documents for this subject (8)
Seller
Follow
ayra1999
Reviews received
Content preview
Data Science for Business Book Summary
Strategy Analytics – Chapter 1
Data mining is used for general customer relationship management to analyze customer
behaviour to manage attrition and maximize expected customer value.
Data science: a set of fundamental principles that guide the extraction of knowledge from
data.
Data mining: the extraction of knowledge from data, via technologies that incorporate these
principles.
Churn: customers switching from one company to another.
The ultimate goal of data science is improving decision making.
DDD (Data-driven Decision making) – increases productivity. The more data-driven, the more
productive.
Sort of decisions that we are interested in fall into two types:
1. Decisions for which ‘discoveries’ need to be made within data.
2. Decisions that repeat, especially at a massive scale, and so decision-making can
benefit from even small increases in decision-making accuracy based on data
analysis.
Big data: datasets that are too large for original data processing systems, and therefore
require new processing technologies. using big data technologies is associated with
significant additional productivity growth.
One of the fundamental principles of data science: data, and the capability to extract useful
knowledge from data, should be regarded as key strategic assets. Teams to analyze and have
the data available are complementary.
The fundamental concepts of data science
A. Extracting useful knowledge from data to solve business problems can be treated
systematically by following a process with reasonable well-defined stages.
B. From a large mass of data, information technology can be used to find informative
descriptive attributes of entities of interest.
C. If you look too hard at a set of data, you will find something – but it might not
generalize beyond the data you are looking at.
D. Formulating data mining solutions and evaluating the results involves thinking
carefully about the context in which they will be used.
, Chapter 2
FC: A set of canonical data mining tasks; the data mining process; supervises vs unsupervised
data mining.
Data mining is a process with fairly well-understood stages.
Different types of tasks are addressed by algorithms. We will now discuss classification and
regression tasks.
1. Classification and class probability estimation: attempt to predict, for each individual
in a population, which of a (small) set of classes this individual belongs to. The
classes are mutually exclusive.
2. Regression: attempts to estimate/predict for each individual, the numerical value of
some variable for that individual. Regression is related to classification, but
classification predicts whether something will happen, whereas regression predicts
how much something will happen.
3. Similarity matching attempts to identify similar individuals based on data known
about them.
4. Clustering attempts to group individuals in a population together by their similarity,
but not driven by any specific purpose.
5. Co-occurrence grouping (aka frequent itemset mining, association rule discovery and
market-basket analysis) attempts to find associations between entities based on
transactions involving them.
6. Profiling (aka behaviour description) attempts to characterize the typical behaviour
of an individual, group or population.
7. Link prediction attempts to take a large set of data and replace it with a smaller set
of data that contains much of the important information in the larger set. Includes
the loss of information, but has a trade-off for improved insight.
8. Causal modelling attempts to help us understand what events or actions influence
others.
Unsupervised: when there is no specific purpose or target specified for the grouping.
Supervised: a specific target is defined, f.e. ‘will a customer leave when her contract
expires?’ another condition for supervised data is that there must be data on the target.
classification, regression, and causal modelling generally are solved with supervised
methods. Matching, link prediction, and data reduction could be either.
clustering, co-occurrence grouping, and profiling generally are unsupervised.
Two main subclasses of supervised data mining – classification and regression – are
distinguished by the type of target.
Regression – numerical target (How much will this customer use the service?)
Classification – categorial (often binary) target (Will this customer purchase S1 if given
incentive 1?)
In business applications, we often want a numerical prediction over a categorical target.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller ayra1999. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $8.75. You're not tied to anything after your purchase.