100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Uitgebreide samenvatting Data Mining and its Applications €9,99
In winkelwagen

Samenvatting

Uitgebreide samenvatting Data Mining and its Applications

6 beoordelingen
 171 keer bekeken  32 keer verkocht

Uitgebreide samenvatting van het vak Data Mining and Its applications. Ook erg handig om te gebruiken bij de opdracht die bij het vak hoort!

Voorbeeld 4 van de 67  pagina's

  • 29 april 2023
  • 67
  • 2021/2022
  • Samenvatting
Alle documenten voor dit vak (4)

6  beoordelingen

review-writer-avatar

Door: boazvdlaan • 6 maanden geleden

review-writer-avatar

Door: wlvanderpluijm • 1 jaar geleden

review-writer-avatar

Door: kayzanmartiena • 1 jaar geleden

review-writer-avatar

Door: douak02 • 1 jaar geleden

review-writer-avatar

Door: lottegerritsen04 • 1 jaar geleden

review-writer-avatar

Door: sven5 • 1 jaar geleden

avatar-seller
zoehenzen
Data mining

Lecture 1

Why data mining?

Data can be meaningful when considered within the context of a particular problem domain.
In these cases the data came from two specific domains. The first one is railway infrastructure networks
and the other one is from the production industry. The first set of data is from railway infrastructure
networks and they have included thousand of kms of railway tracks. What could be of interest there?
As a user of transport networks, you would like to travel without delays and inconveniences and want
your travel to be safe. As an operator of railway trains you would like your trains to make use of the
network safely, efficiently and without causing damage to your trains or passengers. As an operator of
the network you would like to deliver your infrastructure to the users and the wider public in ways they
simple do not have to pay much attention because you as an infrastructure provider pays sufficient
attention to ensure the network is kept to the standards so you deliver safe and high quality travel
experience. The data you see are from railway track inspections. The problem here is how to get from
the data to a set of prioritized actions, which railway to block if it is not safe, which railway needs
inspection and which to schedule for next month.

In production you can use data to try to predict disruptions. That is not easy because there are many
things that can go wrong. Converting data to action recommendations is not easy at all. Data mining is
about finding a needle in a hay stack.

Other examples of data mining:

- Risk assessment: is a person going to repay a loan
- Demand prediction: how many taxis do I need in NYC this day at noon, how many kW will be
required tomorrow at 6am in London, how many customers will come tonight to my restaurant
- Fraud detection: is this transaction legitimate or fraud
- It is quite common in industrial obligations to monitor the production or production assets
aiming to identify events of interest, predictive maintenance actions. This is anomaly detection

Data is highly valuable asset in itself and can be exploited to drive meaningful decisions. But data can
also be a very misleading asset if you ignore the actual problem context. So for example a bank
discovered a cluster of customers that have left the bank:

- Older than the average customer
- Less likely to have a mortgage
- Less likely to have a credit card

It would be wrong to associate this cluster with this case

Data mining process starts with data from original sources, then
moved through a filter so you will get filtered data. Narrow view of
data mining only focuses on the pattern identification of data
process.

,Another view of how data value is enhanced
at different stages of a data workflow. At the
lower level you see individual data records.
Identifying that the data was taken at a
specific location in a production
environment and that they are linked with
specific production throughput, is
information of higher value. If we identify
patterns that the throughput dropped below
average or production quality deteriorates
faster than expected. Than in itself can drive
actions, for example schedule perform
inspection within a week (non-routine
inspection). But how to interpreted identified patterns. If this drop in quality can be associated with
engaging a different supplier for certain parts to meet the increased demands and avoid disruptions,
we are looking at the same data but attribute specific context to the data, which brings action
recommendations closer and more relevant to the identified context. This is how value is added to data.

Data states

Data can be stocked, on the move and in use.
When people interact with applications, this is
data in use.

In summary, data mining brings together many
data relating activities: data exploration, data
analysis, accesses the data, identifying patterns
or knowledges, evaluation the models and in
most cases the activities have to deal with
analysis of large, heterogeneous data sets. This
is about data in different formats from different
sources.

,The data mining standard process model mentioned the start is always from the business perspective.
What the primary objective of data mining is and what the criteria for success are can only be answered
in application domain specific answers, no generic one. Data mining involves workflows of different
subprocesses, involving different stakeholders, all this makes it necessary that you obtain stakeholders
view for data mining, you engage relevant stakeholders in the process.

The criteria for success are difficult to define. Stakeholders involved in the data mining process speak
different languages.

Problem source Project owners perspective Analyst perspective
Communication Project owner does not Data analyst does not
understand the data science understand the domain specific
concepts and jargon concerns and concepts of the
project owner
Lack of understanding Does not know what the analyst Hard to understand how to help
can do or achieve. Data models the project owner
of analyst differ from those
envisioned by project owner
Organization Requirements changes or Project owner was not really
adapted in later stages as concerned with the data project
problems with the data became and was hard to work with
evident regarding real requirements


Data mining stakeholders

- Business user: business understanding
Has a sound understanding of the business domain targeted by the data mining project. The
person can offer insight into the project context, the business value sought to be extracted via
data mining and advise on how result can be operationalized. A business analyst and or a line
manager might be suitable for such a role
- Project sponsor: project driver
In most cases the initiator or driver for the data mining project. Concerned with the potential
return on investment (ROI) and sets priorities and desired outputs. This person is championing
the project, motivating engagement of key personnel around the business problem
- Project manager: end to end project delivery, concerned with driving but delivering the project
This person is in charge for the datamining project implementation and is concerned with
meeting goals for quality, time and budget targets
- Business intelligence analyst: data understanding
This person acts as the bridge between the data and the business view of the targeted problem.
Maintaining a sound understanding of relevant data, the business intelligent analyst is driving
activities related to key performance indicators (KPI’s) and extracting relevant data for reporting
and dashboarding purposes. Understands sources and consumers of data, as well as need for
changes in data management processes
- Data administrator and integrator: data preparation and solution delivery
Provides action support for implementing key data access and processing activities, needed by
stakeholders of the data mining project. A technical person with sound data management
competences, including awareness of security and or privacy concerns would be appropriate
- Data scientist or engineer: data modelling and evaluation

, This person combines data management skills with a sound understanding of data analysis
methods and tools and is driving the ingestion of data into the overall data analytics process.
The data scientist is able to communicate the analytics methods to the other stakeholders

The data engineer and the administrator + integrator are working closely on the technical side of data
mining and share relevant code and documentation.

Data mining project workflow

1. Phase 1: inception and discovery
- The project team established the project baseline. This includes a shared understanding of
business context, history and current practice, as well as the overall framework in terms of
resources, technology enablers, data and available time. An initial solution hypothesis is put
forward and is posed as a challenge for data analytics
2. Phase 2: data preparation
- Data is brought into actionable form in this phase. It may involve data extraction, transformation
and delivery into a data sandbox. The team then familiarises with the data and the underlying
semantic / physical meaning of data
3. Phase 3: model planning
- The methods, techniques and process flow for moving from actionable data to processed data,
through appropriate methods for models is determined here. The process may include study of
data relationships and data/variables selection
4. Phase 4: model building
- The prepared data are now brought into form ready for model building, testing and validation.
Models and methods defined in the model planning phases, are now implemented and
executed. The right hardware and software is made available to this end
5. Phase 5: communicate results
- Results are communicated to involved key stakeholders and are assesses for success, further
work or failure. Key finding are summarised and business value is accesses and communicated
to stakeholders
6. Phase 6: operationalize
- This is the final phase of data mining process and is concluded often with running a pilot project
implementation. Only if necessary in the case.

Clarifying the objectives

Is the goal precise enough? Actionable?

Objective Increase revenues (per campaign and or per
customer) in direct mailing campaigns by
personalized offer and individual customer
selection
Deliverable Software that automatically selects a specified
number of customers from the database to
whom the mailing shall be send, runtime max
half-day for database of current size
Success criteria Improve order rate by 5% or total revenues by
5%, measured within 4 weeks after mailing was
sent, compared to rate of last 3 mailings

Once the solution is identified, explore advantages and disadvantages.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper zoehenzen. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €9,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 52510 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€9,99  32x  verkocht
  • (6)
In winkelwagen
Toegevoegd