100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary data mining and its applications (including book and lectures) €8,99   In winkelwagen

Samenvatting

Summary data mining and its applications (including book and lectures)

 59 keer bekeken  4 keer verkocht

Summary of the subject data mining and its applications at Rijksuniversiteit Groningen. Year 2 of bedrijfskunde / pre-master. Summary of the relevant chapters of the book and the lecture slides. You can also bring this to the exam.

Laatste update van het document: 1 jaar geleden

Voorbeeld 4 van de 48  pagina's

  • Nee
  • Hoofdstuk 1, 3, 4, 5 (5.1, 5.2, 5.4, 5.5, 5.6), 7, 8 (8.1, 8.3, 8.4, 8.5), 9 (9.1, 9.6.1)
  • 9 juni 2023
  • 10 juni 2023
  • 48
  • 2022/2023
  • Samenvatting
book image

Titel boek:

Auteur(s):

  • Uitgave:
  • ISBN:
  • Druk:
Alle documenten voor dit vak (4)
avatar-seller
karlijnheikens54
Data Mining and its Applica/ons

Week 1 - Chapter 1,2,3 and 4

Chapter 1 - Introduc1on

Data science: The goal of this area was to meet the challenge to develop tools that can help
humans to find potenAally useful paCerns in their data and to solve the problems they are
facing by making beCer use of the data they have.

Data
- Refer to single instances (single objects, people, events, points in Ame, etc.)
- Describe individual properAes
- Are oLen available in large amounts (databases, archives)
- Are oLen easy to collect or to obtain (e.g., scanner cashiers in supermarkets, Internet)
- Do not allow us to make predicAons or forecasts

Data states:
- Data at rest
- Data on the move
- Data in use

Knowledge
- Refers to classes of instances (sets of objects, people, events, points in Ame, etc.)
- Describes general paCerns, structures, laws, principles, etc.
- Consists of as few statements as possible (this is actually an explicit goal, see below)
- Is oLen difficult and Ame consuming to find or to obtain (e.g., natural laws, educaAon)
- Allows us to make predicAons and forecasts

These characterizaAons make it very clear that generally knowledge is much more valuable
than (raw) data.

Enriching the value of data: Data à InformaAon à knowledge à context

Criteria to assess knowledge
- Correctness (probability, success in tests)
- Generality (domain and condiAons of validity)
- Usefulness (relevance, predicAve power)
- Comprehensibility (simplicity, clarity, parsimony)
- Novelty (previously unknown, unexpected)

Sta/s/cs has a long history and originated from collecAng and analyzing data about the
populaAon and the state in general. StaAsAcs can be divided into descripAve and inferenAal
staAsAcs.
- Descrip/ve sta/s/cs summarizes data without making specific assumpAons about the
data, oLen by characterisAc values like the (empirical) mean or by diagrams like
histograms.


1

, - Inferen/al sta/s/cs provides more rigorous methods than descripAve staAsAcs that
are based on certain assumpAons about the data generaAng random process. The
conclusions drawn in inferenAal staAsAcs are only valid if these assumpAons are
saAsfied.

We disAnguish between experimental and observaAonal studies
- In an experimental study one can control and manipulate the data generaAng process.
- In an observa/onal study one cannot control the data generaAng process.

Hypothesis tes/ng: based on the collected data, we desire to either confirm or reject some
hypothesis about the considered domain.
Exploratory data analysis is concerned with generaAng hypotheses from the collected data.

Data science: Powerful tools and technologies that can process and analyze massive amounts
of data.




2

,Problem categories
- Classifica/on
Predict the outcome of an experiment with a finite number of possible results (like
yes/no or unacceptable/acceptable/good/very good). We may be interested in a
predicAon because the true result will emerge in the future or because it is expensive,
difficult, or cumbersome to determine it.
- Regression
Regression is, just like classificaAon, also a predicAon task, but this Ame the value of
interest is numerical in nature.
- Clustering, segmenta/on
Summarize the data to get a beCer overview by forming groups of similar cases (called
clusters or segments). Instead of examining a large number of similar records, we need
to inspect the group summary only. We may also obtain some insight into the structure
of the whole data set. Cases that do not belong to any group may be considered as
abnormal or outliers.
- Associa/on analysis
Find any correlaAons or associaAons to beCer understand or describe the inter-
dependencies of all the aCributes. The focus is on relaAonships between all at- tributes
rather than on a single target variable or the cases (full record).
- Devia/on analysis
Knowing already the major trends or structures, find any excepAonal subgroup that
behaves differently with respect to some target aCribute.

Catalog of Methods
- Finding paIerns
If the domain (and therefore the data) is new to us or if we expect to find interest- ing
relaAonships, we explore the data for new, previously unknown paCerns. We want to
get a full picture and do not concentrate on a single target aCribute, yet. We may apply
methods from, for instance, segmentaAon, clustering, associaAon analysis, or deviaAon
analysis.
- Finding explana/ons
We have a special interest in some target variable and wonder why and how it varies
from case to case. The primary goal is to gain new insights (knowledge) that may
influence our decision making, but we do not necessarily intend automaAon. We may
apply methods from, for instance, classificaAon, regression, associaAon analysis, or
deviaAon analysis.
- Finding predictors
We have a special interest in the predicAon of some target variable, but it (possibly)
represents only one building block of our full problem, so we do not really care about
the how and why but are just interested in the best-possible predicAon. We may apply
methods from, for instance, classificaAon or regression.




3

, Chapter 3: Project understanding

Project understanding: In this iniAal phase of the data analysis project, we have to map a
problem onto one or many data analysis tasks. The project understanding phase should be
carried out with care to keep the project on the right track.

Problem source Project owner perspecAve Analyst perspecAve
CommunicaAon Project owner does not Analyst does not understand
understand the technical the terms of the domain of
terms of the analyst the project owner
Lack of understanding Project owner was not sure Analyst found it hard to
what the analyst could do or understand how to help the
achieve project owner
Models of analyst were
different from what the
project owner envisioned
OrganizaAon Requirements had to be Project owner was an
adopted in later stages as unpredictable group (not so
problems with the data concerned with the project
became evident

Data mining stakeholders
- Business User: business understanding
Has a sound understanding of the business domain targeted by the data mining project.
The person can offer insight into the project context, the business value sought to be
extracted via data mining and advise on how results can be operaAonalized. A Business
Analyst and/or a Line Manager might be suitable for such a role.
- Project Sponsor: project driver
In most cases the iniAator or driver for the data mining project. Concerned with the
potenAal Return On Investment (ROI) and sets prioriAes and desired outputs. This
person is championing the project, moAvaAng engagement of key personnel around
the business problem
- Project Manager: end to end project delivery
This person is in charge for the data mining project implementaAon and is concerned
with meeAng goals for quality, Ame, and budget targets.
- Business Intelligence Analyst: data understanding
This person acts as the bridge between the data and the business view of the targeted
problem. Maintaining a sound understanding of relevant data, the Business
Intelligence Analyst is driving acAviAes related to Key Performance Indicators (KPIs) and
extracAng relevant data for reporAng and dashboarding purposes. Understands
sources and ‘consumers’ of data, as well as need for changes in data management
processes.
- Data Administrator & Integrator: data prepara/on & solu/on delivery
Provides acAon support for implemenAng key data access and processing acAviAes,
needed by stakeholders of the data mining project. A technical person with sound data
management competences, including awareness of security and/or privacy concerns
would be appropriate.


4

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper karlijnheikens54. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €8,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 79223 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€8,99  4x  verkocht
  • (0)
  Kopen