Data can be meaningful when considered within the context of a particular problem domain.
In these cases the data came from two specific domains. The first one is railway infrastructure networks
and the other one is from the production industry. The first set of data is from railway infrastructure
networks and they have included thousand of kms of railway tracks. What could be of interest there?
As a user of transport networks, you would like to travel without delays and inconveniences and want
your travel to be safe. As an operator of railway trains you would like your trains to make use of the
network safely, efficiently and without causing damage to your trains or passengers. As an operator of
the network you would like to deliver your infrastructure to the users and the wider public in ways they
simple do not have to pay much attention because you as an infrastructure provider pays sufficient
attention to ensure the network is kept to the standards so you deliver safe and high quality travel
experience. The data you see are from railway track inspections. The problem here is how to get from
the data to a set of prioritized actions, which railway to block if it is not safe, which railway needs
inspection and which to schedule for next month.
In production you can use data to try to predict disruptions. That is not easy because there are many
things that can go wrong. Converting data to action recommendations is not easy at all. Data mining is
about finding a needle in a hay stack.
Other examples of data mining:
- Risk assessment: is a person going to repay a loan
- Demand prediction: how many taxis do I need in NYC this day at noon, how many kW will be
required tomorrow at 6am in London, how many customers will come tonight to my restaurant
- Fraud detection: is this transaction legitimate or fraud
- It is quite common in industrial obligations to monitor the production or production assets
aiming to identify events of interest, predictive maintenance actions. This is anomaly detection
Data is highly valuable asset in itself and can be exploited to drive meaningful decisions. But data can
also be a very misleading asset if you ignore the actual problem context. So for example a bank
discovered a cluster of customers that have left the bank:
- Older than the average customer
- Less likely to have a mortgage
- Less likely to have a credit card
It would be wrong to associate this cluster with this case
Data mining process starts with data from original sources, then
moved through a filter so you will get filtered data. Narrow view of
data mining only focuses on the pattern identification of data
process.
,Another view of how data value is enhanced
at different stages of a data workflow. At the
lower level you see individual data records.
Identifying that the data was taken at a
specific location in a production
environment and that they are linked with
specific production throughput, is
information of higher value. If we identify
patterns that the throughput dropped below
average or production quality deteriorates
faster than expected. Than in itself can drive
actions, for example schedule perform
inspection within a week (non-routine
inspection). But how to interpreted identified patterns. If this drop in quality can be associated with
engaging a different supplier for certain parts to meet the increased demands and avoid disruptions,
we are looking at the same data but attribute specific context to the data, which brings action
recommendations closer and more relevant to the identified context. This is how value is added to data.
Data states
Data can be stocked, on the move and in use.
When people interact with applications, this is
data in use.
In summary, data mining brings together many
data relating activities: data exploration, data
analysis, accesses the data, identifying patterns
or knowledges, evaluation the models and in
most cases the activities have to deal with
analysis of large, heterogeneous data sets. This
is about data in different formats from different
sources.
,The data mining standard process model mentioned the start is always from the business perspective.
What the primary objective of data mining is and what the criteria for success are can only be answered
in application domain specific answers, no generic one. Data mining involves workflows of different
subprocesses, involving different stakeholders, all this makes it necessary that you obtain stakeholders
view for data mining, you engage relevant stakeholders in the process.
The criteria for success are difficult to define. Stakeholders involved in the data mining process speak
different languages.
Problem source Project owners perspective Analyst perspective
Communication Project owner does not Data analyst does not
understand the data science understand the domain specific
concepts and jargon concerns and concepts of the
project owner
Lack of understanding Does not know what the analyst Hard to understand how to help
can do or achieve. Data models the project owner
of analyst differ from those
envisioned by project owner
Organization Requirements changes or Project owner was not really
adapted in later stages as concerned with the data project
problems with the data became and was hard to work with
evident regarding real requirements
Data mining stakeholders
- Business user: business understanding
Has a sound understanding of the business domain targeted by the data mining project. The
person can offer insight into the project context, the business value sought to be extracted via
data mining and advise on how result can be operationalized. A business analyst and or a line
manager might be suitable for such a role
- Project sponsor: project driver
In most cases the initiator or driver for the data mining project. Concerned with the potential
return on investment (ROI) and sets priorities and desired outputs. This person is championing
the project, motivating engagement of key personnel around the business problem
- Project manager: end to end project delivery, concerned with driving but delivering the project
This person is in charge for the datamining project implementation and is concerned with
meeting goals for quality, time and budget targets
- Business intelligence analyst: data understanding
This person acts as the bridge between the data and the business view of the targeted problem.
Maintaining a sound understanding of relevant data, the business intelligent analyst is driving
activities related to key performance indicators (KPI’s) and extracting relevant data for reporting
and dashboarding purposes. Understands sources and consumers of data, as well as need for
changes in data management processes
- Data administrator and integrator: data preparation and solution delivery
Provides action support for implementing key data access and processing activities, needed by
stakeholders of the data mining project. A technical person with sound data management
competences, including awareness of security and or privacy concerns would be appropriate
- Data scientist or engineer: data modelling and evaluation
, This person combines data management skills with a sound understanding of data analysis
methods and tools and is driving the ingestion of data into the overall data analytics process.
The data scientist is able to communicate the analytics methods to the other stakeholders
The data engineer and the administrator + integrator are working closely on the technical side of data
mining and share relevant code and documentation.
Data mining project workflow
1. Phase 1: inception and discovery
- The project team established the project baseline. This includes a shared understanding of
business context, history and current practice, as well as the overall framework in terms of
resources, technology enablers, data and available time. An initial solution hypothesis is put
forward and is posed as a challenge for data analytics
2. Phase 2: data preparation
- Data is brought into actionable form in this phase. It may involve data extraction, transformation
and delivery into a data sandbox. The team then familiarises with the data and the underlying
semantic / physical meaning of data
3. Phase 3: model planning
- The methods, techniques and process flow for moving from actionable data to processed data,
through appropriate methods for models is determined here. The process may include study of
data relationships and data/variables selection
4. Phase 4: model building
- The prepared data are now brought into form ready for model building, testing and validation.
Models and methods defined in the model planning phases, are now implemented and
executed. The right hardware and software is made available to this end
5. Phase 5: communicate results
- Results are communicated to involved key stakeholders and are assesses for success, further
work or failure. Key finding are summarised and business value is accesses and communicated
to stakeholders
6. Phase 6: operationalize
- This is the final phase of data mining process and is concluded often with running a pilot project
implementation. Only if necessary in the case.
Clarifying the objectives
Is the goal precise enough? Actionable?
Objective Increase revenues (per campaign and or per
customer) in direct mailing campaigns by
personalized offer and individual customer
selection
Deliverable Software that automatically selects a specified
number of customers from the database to
whom the mailing shall be send, runtime max
half-day for database of current size
Success criteria Improve order rate by 5% or total revenues by
5%, measured within 4 weeks after mailing was
sent, compared to rate of last 3 mailings
Once the solution is identified, explore advantages and disadvantages.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through EFT, credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying this summary from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller zoehenzen. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy this summary for R196,85. You're not tied to anything after your purchase.