100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Samenvatting Guide to Intelligent Data Science - Data Mining and its Applications (EBB056B05) €7,83
In winkelwagen

Samenvatting

Samenvatting Guide to Intelligent Data Science - Data Mining and its Applications (EBB056B05)

 14 keer bekeken  0 keer verkocht

Alle verplichte hoofdstukken en alle colleges worden in deze samenvatting behandeld.

Voorbeeld 3 van de 22  pagina's

  • Ja
  • 15 juni 2023
  • 22
  • 2022/2023
  • Samenvatting
book image

Titel boek:

Auteur(s):

  • Uitgave:
  • ISBN:
  • Druk:
Alle documenten voor dit vak (4)
avatar-seller
ayebdrenth
Samenvatting Data Mining and its Applications
Week 1
- Lecture 1
Data mining is the extraction of interesting information or patterns from large data sources,
which may originally have been developed for other purposes, employing machine and
statistical learning and possibly high-end computational power, in order to serve business
purposes.
Data mining examples: Risk assessment, demand forecasting, fraud detection, anomaly
detection.
From data → knowledge




Data can be at rest, on the move or in use.




There are several data mining stakeholders:
● Business user: Business understanding
● Project Sponsor: Project driver
● Project manager: end to end project delivery
● Business intelligence Analyst: data understanding
● Data administrator & Integrator: data preparation & solution delivery
● Data scientist/ engineer: data modeling and evaluation

Data mining project workflow:
Inception and discovery → Data preparation → Model planning → Model building→
Communicate results → Operationalise
ETL: extraction, Transformation, Loading
Goal of the data understanding phase is gaining general insights about the data that will
potentially be helpful for further steps in the data analysis process. Never trust data until you
have carried out some simple plausibility checks.
Attributes: Features, variables

,Instances: Records, data objects, entries
Data can usually be described in terms of tables or matrices
Attributes differ for their scale type, according to the type of values that they can assume
Three scale types: • Categorical / Nominal • Ordinal • Numeric
Granulariteit, de staat van bestaan in korrels of korrels, verwijst naar de mate waarin een
materiaal of systeem is samengesteld uit te onderscheiden stukken.
Some attributes have a fixed domain (months), some change over time (products in a catalog)
Data quality issues: Availability, usability, reliability, relevance, presentation quality.
Accuracy is defined as the closeness between the value in the data and the true value
→ Syntactic, the value might not be correct but it belongs at least to the domain corresponding
attribtue
→ Semantic, the value might not be in the domain of the corresponding attribute, but it is not
correct.
Data quality issues: completeness
Visualisation charts: Comparison, time series, correlation, value distribution

Chapter 1 - Motivation
Data refer to single instances, describe individual properties, are often available in large
amounts, easy to collect or obtain or do not allow us to make predictions.
Knowledge refers to classes of instances, describes general patterns, structures, laws etc,
consists of as few statements as possible, is often difficult and time consuming to find or to
obtain and allows us to make predictions and forecasts.
Criteria to assess knowledge:
- Correctness
- Generality
- Usefulness
- Comprehensibility
- Novelty
Descriptive statistics summarises data without making specific assumptions about the data.
Inferential statistics provide more rigorous methods than descriptive statistics that are based on
certain assumptions about the data generating random process.
In an experimental study one can control and manipulate the data generating process.
In an observational study one cannot control the data generating process.
Exploratory data analysis is concerned with generating hypotheses from the collected data.
Data science, the opportunity of analysing large real world data repositories that were initially
collected for different purposes that came with the availability of powerful tools and technologies
that can process and analyse massive amounts of data.
CRISP-DM:

, Problem categories:
- Classification, predict the outcome of an experiment with a finite number of possible
results.
- Regression, a prediction task with a numerical value of interest.
- Clustering, summarise the data to get a better overview by forming groups of similar
cases.
- Association analysis, find any correlations or associations to better understand or
describe the interdependencies of all attributes.
- Deviation analysis, knowing already the major trends or structures, find any exceptional
subgroup that behaves differently with respect to some target attribute.

Chapter 2 - Practical data science: an example
An example is described with a naive and a sound approach

Chapter 3 - Project understanding
Determine the project objective: objective, deliverable, success criteria
Assess the situation, assessing resources, clarifying access, evaluating assumptions and risks,
and verifying the suitability of data for the project to avoid wasting resources on potentially
unsuccessful endeavours.
Determine analysis goals: It is crucial to carefully consider the limitations and practical
implications of the chosen architecture to ensure that the developed model aligns with the
intended use and produces valuable results.
Desirable properties: Interpretability, reproducibility, model flexibility, runtime, interestingness

Chapter 4 - Data understanding
Domain is the set of possible values for an attribute.
Scale type: nominal, ordinal, numeric
Granularity is the level of refinement chosen.
Data quality refers to how well the data fit their intended use.
- Accuracy is defined as the closeness between the value in the data and the true value.
- Syntactic accuracy means that a considered value might not be correct, but it belongs at
least to the domain of the corresponding attribute.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper ayebdrenth. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,83. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 52510 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€7,83
  • (0)
In winkelwagen
Toegevoegd