Machine Learning (Data Mining) - Samenvatting (slides en handboek)
Full Summary of Chapters and Lecture Slides Data Science for Business
All for this textbook (25)
Written for
Tilburg University (UVT)
MSc. Strategic Management
Strategy Analytics
All documents for this subject (8)
Seller
Follow
jeroenbodaan
Content preview
Summary Strategic Analytics
Week 1 chapter 1 & 2
Learning goals:
Data science fundamentals
Data science capability as strategic asset
The data mining process in business
Supervised versus unsupervised methods in data mining
Linking data science to the business world
Data-driven decision-making (DDD)
- Refers to the practice of basing decision on the analysis of data, rather than purely
on intuition.
Data science
- Involves principles, processes, and techniques for understanding phenomena via the
(automated) analysis of data
The sort of decisions of interest
- Need discovery (Non-obvious)
- Repetitive decisions
Big data
- Large data sets with 3 distinct characteristics (3 V’s)
1. Volume: The quantity of generated and stored data
2. Variety: The type and nature of the data
3. Velocity: The speed at which the data is generated
and processed
Data mining
- The extraction of knowledge from data, via
technologies that incorporate these principles
Data analytics
- The process of examining datasets in order to draw conclusions about the useful
information they may contain
Types of data analysis
1. Descriptive Analytics: What has happened?
a. Simple descriptive statistics, dashboard, charts and diagrams
2. Predictive Analytics: What could happen?
a. Segmentation, regression (Stata)
3. Prescriptive Analytics: What should we do?
a. Complex models for product planning and stock optimization (Weka)
Data and the capability to extract useful knowledge from data can be strategic asset
Strategic Analytics – Jeroen Bodaan
,From business problems to data mining tasks
- Decomposing a business problem into (solvable) subtasks
- Matching the subtasks with known tasks for which tools are available
- Solving the remaining non-matched subtasks (by creativity)
- Putting the subtasks together to solve the overall problem
Supervised learning: There is a specific target variable
Unsupervised learning: There is no specific target variable
Supervised learning
- Training data has one feature that has the “outcome”
- The goal is to build a model to predict the outcome (Machine learning to predict)
- The outcome data has a known value, model can be evaluated
o Split the data into a training and test set
o Model the training set/ predict the test
o Compare the predictions to the know values
- Algorithm:
o Model/ensemble
o Logistic regression
o Time series
Unsupervised learning
- Training data provides “examples” no specific “outcome”
- The machine tries to find specific pattern in the data
- Because the model has no “outcome” the outcome cannot be evaluated
- Algorithm:
o Clusters
o Anomaly detection
o Association discovery
o Topic modeling
Supervised learning I.E. questions Training data
How much is this home worth? Previous home sales
Will this customer default on a loan? Previous loan that were paid or defaulted
How many customers will apply for a loan Previous months of loan application
next month
Unsupervised learning I.E. questions Training data
Are these customers similar? Customer profile
Is this transaction unusual? Previous transactions
Are the products purchased together? Example of previous purchases
Strategic Analytics – Jeroen Bodaan
,The data mining process:
Business understanding
- This stage represents a part of the craft where the analysts’ creativity plays a large
role
o The design team should think carefully about the use scenario
This itself is one of the most important concepts of data science
- Business project seldom come pre-packaged as clear and unambiguous data mining
problems
Data understanding
- Important to understand strengths and limitations of the data
- Critical part is estimating cost and benefits of each data source and deciding whether
further investment is merited
Data preparation
- Data is manipulated and converted to forms that yield better results
- Quality of the data mining solution rests on how well the analysts structure the
problems and craft the variables
- Beware of ‘leaks’
- Leak: A situation where a variable collected in historical data gives information on
the target variable – information that appears in historical data but is not actually
available when the decision has to be made.
Modeling
- Primary place where data mining techniques are applied to the data
Evaluation
- To assess the data mining results rigorously and to gain confidence that they are
valid an reliable
- Serves to help ensure that the model satisfies the original business goal
- Includes both quantitative and qualitative assessment
- Comprehensibility of the model to stakeholders
- Usually a data mining solution is only a piece of the larger solution and it needs to be
evaluated as such
Deployment
- Two main reasons for deploying data mining system itself rather that the models
produced by the data mining system
o 1. World changes faster that data scientist can adapt
o 2. A business has to many modelling tasks for their data science team to
manually curate each model individually
- Deploying a model into the business systems requires to
model to be coded
Implications for managing data science team
- To view the data mining process as a software development
cycle
- Instead, analytics projects should prepare to invest in
information to reduce uncertainty in various ways
Strategic Analytics – Jeroen Bodaan
, Week 2 chapter 3 & 4
Learning goals:
Concepts
Models, Induction, Deduction
Supervised Segmentation
Classification Trees
Entropy & Information Gain
Parametric Models
Linear discriminant function
Logistic regression
Support vector machine
Terminology
Synonyms for ‘dataset’ Synonyms for ‘entity’
Sample Object
Population Instance
Data Observation
Set Element
Work set Line
Row
Feature vector
Synonyms for ‘attribute’:
- Feature, characteristic, variable, column
Model: a simplified representation of reality created to serve a purpose
- Abstraction of irrelevant details
Models serve different purposes in data science:
- Unsupervised setting: to identify (classes, groups, patterns, etc.), Descriptive
- Supervised setting: to predict (“to estimate an unknown value”), Predictive
Induction: “Generalizing from specific cases to general rules” (I.e. developing classification
and regression models)
Deduction: “Applying general rules and specific facts to create other specific facts” (i.e.
using classing classification and regression models)
Complications with supervised segmentation:
- Attributes rarely split a group perfectly
- Hard to tell if split produces the right subset
- Not all attributes are binary; many have three or more distinctive values
- Some attributes take on numeric values (continuous or integer)
Strategic Analytics – Jeroen Bodaan
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller jeroenbodaan. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.16. You're not tied to anything after your purchase.