Lecture 1 Expect slightly fewer questions from Week 1,6 & 7
Observation in our daily life:
Marketing Finance Retail
Online advertising Credit scoring and Marketing
Recommendations for cross-selling trading Supply chain
Customer relationship Fraud detection management
management Workforce management
Nowadays data available about everything
Opportunity for data analysis to prove decision making etc.
Fundamental concepts
Data-driven decision Making (DDD) = refers to the practice of basing decision on
the analysis of data, rather than purely on intuition.
Useful
Their complementary
1. Data science = involves principles, processes, and techniques for
understanding phenomena via the (automated) analysis of data
To address specific questions (business decision)
Helps us to take good decisions in uncovering nonobvious explanations.
Important: (1) should be something not obvious (2) use them in all situations. To
apply insights in future for similar
situations
2. Data Mining = the extraction of
knowledge form data, via
technologies that incorporate
these principles.
3. Machine learning =
Data = fact of figures (not the information itself)
Information = context for the data (after process, interpreted, organized, structured,
presents which makes it meaningful and useful)
Big Data = simply, a very large dataset. 3 characteristics:
Volume: the quantity of generated and stored data
Variety: the type and nature of the data
Velocity: the speed at which the data is generated and
processed
Also: internal and external
Data Analysis = The process of examining datasets in order to draw conclusions about the
useful information they may contain.
Types:
Descriptive Analytics: what has happened?
o Simple descriptive statistics, dashboard, charts, diagrams
, Predictive Analytics: What could happen?
o More useful
o Segmentation, regressions
Prescriptive Analytics: What should we do? (not in this course)
o Complex models for product planning and stock optimalization
o Most specific on what to do with the data results
Data Science capability as strategic asset
Strategic asset = data and the capability to extract useful knowledge from data can be
strategic asset
Delta model:
Data (clean, accessible and unique) - Enterprise (focus) - Leaders - Targets - Analysts
improve decision making and be a step ahead your competitors
From business problems to data mining tasks
A collaborative problem-solving between business stakeholders and data scientists:
Decomposing a business problem into (solvable) subtasks
Matching the subtasks with known tasks for which tools are available
Solving the remaining non-matched subtasks (creativity)
Putting the subtasks together to solve the overall problem
Typology of methods:
Classification Profiling Co-occurrence Grouping
Regression Link prediction Data reduction
Similarity matching Clustering Causal modeling
The key question = is there a specific target variable? Target variable (DV)
Yes – supervise learning (you are looking for something)
No – unsupervised learning
Unsupervised learning
Training data provides examples – no specific outcomes
The machine tries to find specific patterns in the data
Algorithm:
o Clusters
o Anomaly detection
o Association discovery
o Topic modeling
Because the model has no outcome, can not be evaluated. Not predicting anything
Independent variables distance measure find a pattern
Examples: Are these customer similar customer profile
Is this transaction unusual previous transactions
Are the product purchased together examples of previous purchases
, Supervised learning
Training data has one feature that is the outcome
The goals is to build a model to predict the outcome (the machine learns to predict)
The outcome data has a known value, model can be evaluated
o Split the data into a training and test set
o Model the training set/predict the test
o Compares the prediction to the known values
Come with definit conclusions about the quality of the fit (good or bad)
Algorithm
o Model/ensemble
o Logistic regression
o Time series
Examples: how much is this home worth previous home sales
Will this customer default on a loan previous loan that were paid or defaulted
How many customers will apply for a loan next month previous months of
loan application
Data mining
Reuse in other circumstances. Important to make a
distinction to the mining part.
1. You use historical data
2. Use the results of data mining for predictions
3. Use the model in new data
Keep in mind that it is a process
Important: not linear process, you have to constant
collaboration of business understanding and data
understanding
Data mining focusses on the automatic search for knowledge patterns from data rather
than providing sort of technical support for manual search. should help the company
to discover non obvious knowledge.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller myrtheslooten. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $5.90. You're not tied to anything after your purchase.