This summary includes all the lectures and reading material of the course in 2022. Note that no pictures of the slides are included in the file because of legislations by the TU/e.
1BVK00 Summary (2021-2022)
Table of Contents
Introduction to Data Analytical Thinking ...................................................................................................... 2
Business Problems & Data Science Solutions ................................................................................................ 3
Predictive Modeling: Fitting Model to Data .................................................................................................. 5
Overfitting and its Avoidance ........................................................................................................................ 7
Visualizing Predictive Model Performance ................................................................................................. 10
Similarity and Clustering.............................................................................................................................. 12
Evidence and Probabilities .......................................................................................................................... 14
Fuzzy Logic and Decision Making ................................................................................................................ 15
Fuzzy Cognitive Maps and Decision Making................................................................................................ 18
Interpretability of Decision Models............................................................................................................. 19
,Introduction to Data Analytical Thinking
Business analytics & decision support provides:
1. Data-driven approach for each decision type
a. Strategical: unstructured, one-time decisions
b. Tactical: semi-structured, reporting decisions
c. Operational: structured, recurrent decisions
2. A structured way of dealing with the decision problems
• Data science: the practice of organizing and analyzing data to gain insights that may prove helpful for
human decision-making
- Interdisciplinary areas:
▪ Artificial intelligence: how computers and machines can demonstrate intelligent behavior
▪ Machine learning: a subcategory of AI that enables computer algorithms to automatically
learn from data
• Data-driven decision-making (DDD): the practice of basing decisions on the analysis of data, rather
than purely on intuition
- The more data-driven a firm is, the more productive it is
- Automatic DDD: automatic decision making done by computer systems
• Data, and the capability to extract useful knowledge from data, should be regarded as key strategic
assets
CRISP-DM methodology: Cross Industry Standard Process for Data Mining
• Iterates on approaches and strategy rather than on software design
• The results of a given step may change the fundamental understanding of the problem
Business understanding: Modeling:
• Business objectives • Select modeling
• Success criteria (KPI) techniques
• Project plan • Build/train model
• Deliverables • Prediction
Data Understanding: Evaluation:
• Initial data collection • Model validation
• Data description • Performance
• Data exploration metrics
Data preparation: • Visualization
• Data cleaning • Review results
• Sampling Deployment:
• Normalization • Model in
• Feature Selection production
Big data: datasets that are too large for traditional data processing systems, and therefore requires new
processing technologies
• We are in the era of Big Data 1.0 because firms are busying themselves with building the capabilities
to process large data
• When big Data 2.0 will arrive firms should begin asking ‘What can I now do that I couldn’t do before,
or do better than I could do before?’
Canonical data mining tasks
, • Supervised: when a specific purpose or target is specified for grouping, and there is data on the
target
- Classification: Determine which discrete category the example is (categorical)
▪ Class probability estimation: model the probability that something will happen
- Regression: attempts to estimate or predict, for each individual, the numerical value of some
variable for that individual (numerical/probability)
- Causal modeling
• Unsupervised: When no specific purpose or target is specified for grouping
- Clustering: attempts to group individuals in a population together by their similarity, but not
driven by any specific purpose
- Co-occurrence grouping: attempts to find associations between entities based on transactions
involving them
- Profiling: attempts to characterize the typical behavior of an individual, group, or population
• Either supervised or unsupervised:
- Link prediction: attempts to predict connections between data items, usually by suggesting that a
link should exist, and possibly also estimating the strength of the link
- Similarity matching: attempts to identify similar individuals based on data known about them
There is another important distinction pertaining to mining data, namely
1. Mining the data to find patterns and build models
2. Using the results of data mining
Analytical techniques and technologies
• Statistics
- Summary statistics: the computation of particular numeric values of interest from data
- Statistics (the field): provides us with a huge amount of knowledge that underlies analytics and
can be thought of as a component of the larger field of Data Science
• Database querying
- Query: a specific request for a subset of data or for statistics about data, formulated in a
technical language and posed to a database system
• Data warehousing: collect and coalesce data from across an enterprise, often form multiple
transaction-processing systems, each with its own database
• Regression analysis: explanatory modeling and predictive modeling have a considerable overlap in
the techniques used, but the lessons learned from explanatory modeling do not apply to predictive
modeling
Business Problems & Data Science Solutions
Building good datasets
Garbage in, garbage out!: bad quality data will result in bad quality mining results
Issues affecting data quality:
• Uniqueness • Missing values • Misspellings
• Formats • Invalid values
• Attribute dependencies • Misfielded values
How to detect these issues:
• Visualization: visualizing all the values of each feature, or taking a random sample to see if it’s right
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper anneTBKIM. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €7,49. Je zit daarna nergens aan vast.