Samenvatting

Samenvatting Guide to Intelligent Data Science - Data Mining and its Applications (EBB056B05)

0 keer verkocht

Instelling
Rijksuniversiteit Groningen (RuG)

Boek
Guide to Intelligent Data Science

Alle verplichte hoofdstukken en alle colleges worden in deze samenvatting behandeld.

[Meer zien]

Voorbeeld 3 van de 22 pagina's

Bekijk voorbeeld

Heel boek samengevat? Ja
Geupload op 15 juni 2023
Aantal pagina's 22
Geschreven in 2022/2023
Type Samenvatting

€7,83

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Samenvatting Data Mining and its Applications
Week 1
- Lecture 1
Data mining is the extraction of interesting information or patterns from large data sources,
which may originally have been developed for other purposes, employing machine and
statistical learning and possibly high-end computational power, in order to serve business
purposes.
Data mining examples: Risk assessment, demand forecasting, fraud detection, anomaly
detection.
From data → knowledge

Data can be at rest, on the move or in use.

There are several data mining stakeholders:
● Business user: Business understanding
● Project Sponsor: Project driver
● Project manager: end to end project delivery
● Business intelligence Analyst: data understanding
● Data administrator & Integrator: data preparation & solution delivery
● Data scientist/ engineer: data modeling and evaluation

Data mining project workflow:
Inception and discovery → Data preparation → Model planning → Model building→
Communicate results → Operationalise
ETL: extraction, Transformation, Loading
Goal of the data understanding phase is gaining general insights about the data that will
potentially be helpful for further steps in the data analysis process. Never trust data until you
have carried out some simple plausibility checks.
Attributes: Features, variables

,Instances: Records, data objects, entries
Data can usually be described in terms of tables or matrices
Attributes differ for their scale type, according to the type of values that they can assume
Three scale types: • Categorical / Nominal • Ordinal • Numeric
Granulariteit, de staat van bestaan in korrels of korrels, verwijst naar de mate waarin een
materiaal of systeem is samengesteld uit te onderscheiden stukken.
Some attributes have a fixed domain (months), some change over time (products in a catalog)
Data quality issues: Availability, usability, reliability, relevance, presentation quality.
Accuracy is defined as the closeness between the value in the data and the true value
→ Syntactic, the value might not be correct but it belongs at least to the domain corresponding
attribtue
→ Semantic, the value might not be in the domain of the corresponding attribute, but it is not
correct.
Data quality issues: completeness
Visualisation charts: Comparison, time series, correlation, value distribution

Chapter 1 - Motivation
Data refer to single instances, describe individual properties, are often available in large
amounts, easy to collect or obtain or do not allow us to make predictions.
Knowledge refers to classes of instances, describes general patterns, structures, laws etc,
consists of as few statements as possible, is often difficult and time consuming to find or to
obtain and allows us to make predictions and forecasts.
Criteria to assess knowledge:
- Correctness
- Generality
- Usefulness
- Comprehensibility
- Novelty
Descriptive statistics summarises data without making specific assumptions about the data.
Inferential statistics provide more rigorous methods than descriptive statistics that are based on
certain assumptions about the data generating random process.
In an experimental study one can control and manipulate the data generating process.
In an observational study one cannot control the data generating process.
Exploratory data analysis is concerned with generating hypotheses from the collected data.
Data science, the opportunity of analysing large real world data repositories that were initially
collected for different purposes that came with the availability of powerful tools and technologies
that can process and analyse massive amounts of data.
CRISP-DM:

, Problem categories:
- Classification, predict the outcome of an experiment with a finite number of possible
results.
- Regression, a prediction task with a numerical value of interest.
- Clustering, summarise the data to get a better overview by forming groups of similar
cases.
- Association analysis, find any correlations or associations to better understand or
describe the interdependencies of all attributes.
- Deviation analysis, knowing already the major trends or structures, find any exceptional
subgroup that behaves differently with respect to some target attribute.

Chapter 2 - Practical data science: an example
An example is described with a naive and a sound approach

Chapter 3 - Project understanding
Determine the project objective: objective, deliverable, success criteria
Assess the situation, assessing resources, clarifying access, evaluating assumptions and risks,
and verifying the suitability of data for the project to avoid wasting resources on potentially
unsuccessful endeavours.
Determine analysis goals: It is crucial to carefully consider the limitations and practical
implications of the chosen architecture to ensure that the developed model aligns with the
intended use and produces valuable results.
Desirable properties: Interpretability, reproducibility, model flexibility, runtime, interestingness

Chapter 4 - Data understanding
Domain is the set of possible values for an attribute.
Scale type: nominal, ordinal, numeric
Granularity is the level of refinement chosen.
Data quality refers to how well the data fit their intended use.
- Accuracy is defined as the closeness between the value in the data and the true value.
- Syntactic accuracy means that a considered value might not be correct, but it belongs at
least to the domain of the corresponding attribute.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper ayebdrenth. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,83. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 64450 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Verkoper

Samenvatting

Samenvatting Guide to Intelligent Data Science - Data Mining and its Applications (EBB056B05)

Document informatie

Onderwerpen

Gekoppeld boek

Meer samenvattingen voor studieboek

Geschreven voor

Verkoper

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?