100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Samenvatting Data Mining and its Applications (EBB056B05) €6,49
In winkelwagen

Samenvatting

Samenvatting Data Mining and its Applications (EBB056B05)

 3 keer bekeken  0 keer verkocht

Samenvatting van de colleges van Data Mining and its Applications, alle slides van alle lectures zijn hierin opgenomen en aangevuld met materiaal van het boek/uitleg van chatGPT. Ik heb zelf een 8,5 gehaald op het tentamen met deze samenvatting er bij.

Voorbeeld 4 van de 96  pagina's

  • Ja
  • 24 juni 2024
  • 96
  • 2023/2024
  • Samenvatting
book image

Titel boek:

Auteur(s):

  • Uitgave:
  • ISBN:
  • Druk:
Alle documenten voor dit vak (4)
avatar-seller
donnakartoidjojo
Lecture 1............................................................................................................................... 3
Lecture 2: Regression..........................................................................................................8
R-squared vs. RMSE.................................................................................................... 10
Linear regression:....................................................................................................... 11
Polynomial regression:................................................................................................12
Regression tree: the algorithm....................................................................................12
Bootstrap AGGregating (Bagging): for each tree/model a training ste is generated by
sampling uniformly with replacement from the standard training set...........................13
Generalization............................................................................................................. 16
Advantages of 5-Fold Cross-Validation...................................................................17
Lecture 3: Time series analysis.......................................................................................... 17
Seasonal effect:..........................................................................................................18
Exponential smoothing............................................................................................... 21
Stationarity................................................................................................................ 22
A seasonal difference is the difference between an observation and the corresponding
observation from the previous (seasonal) cycle...........................................................23
ARIMA Models:........................................................................................................... 24
Sequence segmentation.............................................................................................29
Characteristics of a time series................................................................................... 31
Lecture 4: clustering......................................................................................................... 32
Hierarchical Clustering (Linkage-Based Clustering).................................................... 32
K-Means Clustering (Model-Based Clustering).............................................................32
Density-Based Clustering (DBScan)............................................................................ 33
Example:...............................................................................................................34
Importance of MinPts:...........................................................................................34
Clustering Evaluation..................................................................................................34
Attribute Weighting.................................................................................................... 46
Prototype & model-based (k-means,... clustering).......................................................47
Partitioning; goal: a (disjoint) partitioning into k clusters with minimal costs.............. 47
K-means.....................................................................................................................48
Outliers: k-means vs. k-medoids.................................................................................48
Density-based clustering............................................................................................49
Clustering evaluation...................................................................................................51
Lecture 5: Classifiers; Decision Trees, Model validation...................................................56
Decision Trees............................................................................................................56


1

, Evaluation measures - Shannon Entropy.....................................................................63
Gain Ratio...................................................................................................................70
Gini Index.................................................................................................................... 71
x^2 measure............................................................................................................... 72
Decision Trees - Missing Values...................................................................................73
Pruning.......................................................................................................................74
Reduced Error Pruning................................................................................................76
Pessimistic Pruning.................................................................................................... 76
Model Validation......................................................................................................... 78
Lecture 6: Additional topics on Data Mining......................................................................86
Lecture 7: overview............................................................................................................ 91
ChatGPT..............................................................................................................................92
Example Usage..................................................................................................... 92
Row Splitter Node............................................................................................92
Partitioning Node............................................................................................ 92
Practical Example................................................................................................. 93
How Gain Ratio is Calculated:................................................................................ 93
Example Use:........................................................................................................ 93
How Gini Index is Calculated:.................................................................................94
Purpose of the Gini Index:..................................................................................... 94
Example Use:........................................................................................................94
Characteristics of String Variables........................................................................ 95
Use in Data Mining................................................................................................. 95
Handling String Variables...................................................................................... 95
Example................................................................................................................96




2

,Lecture 1
What is data mining?
→ the extraction of interesting information or patterns from large data sets, which may originally have been
developed for other purposes.

Data states:
● Data at rest
● Data on the move
● Data in use

From data to knowledge:




Data mining project understanding
- What is the primary objective?
- What are the criteria for success?



3

, - These are difficult to define
- Stakeholders involved in the data analysis/mining process speak different languages




Data Mining Stakeholders
● Business User: business understanding
○ Has a sound understanding of the business domain targeted by the data mining project. The
person can offer insight into the project context, the business value sought to be extracted via
data mining and advise on how results can be operationalized.
● Project Sponsor: project driver
○ The initiator or driver for the data mining project. Concerned with the potential ROI and sets
priorities and desired outputs. This person is championing the project, motivating
engagement of key personnel around the business problem.
● Project Manager: end-to-end project delivery
○ In charge for the data mining project implementation and is concerned with meeting goals for
quality, time and budget targets.
● Business Intelligence Analyst: data understanding
○ Bridge between the data and the business view of the targeted problem. Maintaining a sound
understanding of relevant data, the Business Intelligence Analyst is driving activities related to
Key Performance Indicators (KPIs) and extracting relevant data for reporting and dashboarding
purposes. Understands sources and ‘consumers’ of data, as well as need for changes in data
management processes
● Data Administrator & Integrator: data preparation & solution delivery
○ Provides action support for implementing key data access and processing activities, needed
by stakeholders of the data mining project. A technical person with sound data management
competences, including awareness of security and/or privacy concerns would be appropriate.
● Data Scientist/Engineer: data modeling of evaluation
○ This person combines data management skills with a sound understanding of data analysis
methods and tools and is driving the ingestion of data into the overall data analytics process.
The data scientist is able to communicate the analytics methods to the other stakeholders.
→ the data engineer and administrator + integrator are working closely on the technical side of data mining
and share relevant code and documentation.

Data Mining Project Workflow
1. Inception and discovery
a. Tool to sketch beliefs, experiences, known factors
b. How often will a certain product be found in a basket?
2. Data preparation




4

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper donnakartoidjojo. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 52510 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€6,49
  • (0)
In winkelwagen
Toegevoegd