Lecture 1 Expect slightly fewer questions from Week 1,6 & 7
Observation in our daily life:
Marketing Finance Retail
Online advertising Credit scoring and Marketing
Recommendations for cross-selling trading Supply chain
Customer relationship Fraud detection management
management Workforce management
Nowadays data available about everything
Opportunity for data analysis to prove decision making etc.
Fundamental concepts
Data-driven decision Making (DDD) = refers to the practice of basing decision on
the analysis of data, rather than purely on intuition.
Useful
Their complementary
1. Data science = involves principles, processes, and techniques for
understanding phenomena via the (automated) analysis of data
To address specific questions (business decision)
Helps us to take good decisions in uncovering nonobvious explanations.
Important: (1) should be something not obvious (2) use them in all situations. To
apply insights in future for similar
situations
2. Data Mining = the extraction of
knowledge form data, via
technologies that incorporate
these principles.
3. Machine learning =
Data = fact of figures (not the information itself)
Information = context for the data (after process, interpreted, organized, structured,
presents which makes it meaningful and useful)
Big Data = simply, a very large dataset. 3 characteristics:
Volume: the quantity of generated and stored data
Variety: the type and nature of the data
Velocity: the speed at which the data is generated and
processed
Also: internal and external
Data Analysis = The process of examining datasets in order to draw conclusions about the
useful information they may contain.
Types:
Descriptive Analytics: what has happened?
o Simple descriptive statistics, dashboard, charts, diagrams
, Predictive Analytics: What could happen?
o More useful
o Segmentation, regressions
Prescriptive Analytics: What should we do? (not in this course)
o Complex models for product planning and stock optimalization
o Most specific on what to do with the data results
Data Science capability as strategic asset
Strategic asset = data and the capability to extract useful knowledge from data can be
strategic asset
Delta model:
Data (clean, accessible and unique) - Enterprise (focus) - Leaders - Targets - Analysts
improve decision making and be a step ahead your competitors
From business problems to data mining tasks
A collaborative problem-solving between business stakeholders and data scientists:
Decomposing a business problem into (solvable) subtasks
Matching the subtasks with known tasks for which tools are available
Solving the remaining non-matched subtasks (creativity)
Putting the subtasks together to solve the overall problem
Typology of methods:
Classification Profiling Co-occurrence Grouping
Regression Link prediction Data reduction
Similarity matching Clustering Causal modeling
The key question = is there a specific target variable? Target variable (DV)
Yes – supervise learning (you are looking for something)
No – unsupervised learning
Unsupervised learning
Training data provides examples – no specific outcomes
The machine tries to find specific patterns in the data
Algorithm:
o Clusters
o Anomaly detection
o Association discovery
o Topic modeling
Because the model has no outcome, can not be evaluated. Not predicting anything
Independent variables distance measure find a pattern
Examples: Are these customer similar customer profile
Is this transaction unusual previous transactions
Are the product purchased together examples of previous purchases
, Supervised learning
Training data has one feature that is the outcome
The goals is to build a model to predict the outcome (the machine learns to predict)
The outcome data has a known value, model can be evaluated
o Split the data into a training and test set
o Model the training set/predict the test
o Compares the prediction to the known values
Come with definit conclusions about the quality of the fit (good or bad)
Algorithm
o Model/ensemble
o Logistic regression
o Time series
Examples: how much is this home worth previous home sales
Will this customer default on a loan previous loan that were paid or defaulted
How many customers will apply for a loan next month previous months of
loan application
Data mining
Reuse in other circumstances. Important to make a
distinction to the mining part.
1. You use historical data
2. Use the results of data mining for predictions
3. Use the model in new data
Keep in mind that it is a process
Important: not linear process, you have to constant
collaboration of business understanding and data
understanding
Data mining focusses on the automatic search for knowledge patterns from data rather
than providing sort of technical support for manual search. should help the company
to discover non obvious knowledge.
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper myrtheslooten. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €5,49. Je zit daarna nergens aan vast.