100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Methodology for DSS summary €3,99
In winkelwagen

Samenvatting

Methodology for DSS summary

 26 keer bekeken  1 keer verkocht

Summary of Methodology for DSS. Materials of the course are summarised in this document.

Voorbeeld 4 van de 57  pagina's

  • 10 juli 2022
  • 57
  • 2021/2022
  • Samenvatting
Alle documenten voor dit vak (3)
avatar-seller
liekebuuron
Methodology for premasters DSS
Lecture 2

Data science is the art of turning data into actions. Terminology exists since the 1960s. the
term Data Science as in the context of this course is introduces in 1990s in statistics and
data mining communities. It was first named as an independent discipline in 2001.

We can divide data types into three classes.




Structured data: data stored in a traditional relational database management system.
Unstructured data does not fit into a database.
Semi-structured data: not fully structured but structured by text that are useful to create
order and hierarchy.




Impact of data science in business: data science is one of the driving forces in maintaining
competitiveness in the business world. Organizations can achieve:
- 17 -49% increase in their productivity when they increase data usability by 10%
- 11-42% return on assets when they increase data access by 10%
- 5-6% performance improvement via data-driven decisions

The four key activities of Data Science in an organization can be classified into four groups:
- Acquire: obtaining the needed data.
- Prepare: preprocessing operations on the data.
- Analyze: analyzed and the results are interpreted.
- Act: in accordance with the results that are obtained in the previous step, actions are
taken.

,In general, the acquire step takes less effort than the prepare step, the analyze step takes
the biggest amount of effort and the act step takes the least effort.

Data mining aims to reveal patterns in data using machine learning, statistics and database
systems.
A more extensive definition: data mining is the process of extracting previously unknown
and potentially useful information form the data using mathematical, statistical, and
machine learning methods.
Data science has a broader goal such as make data driven business decisions.

Cross Industry Standard Process for Data Mining (CRISP-DM). considering the variety of
industries and variety of data that is generated,
there is a need of a structured approach to
execute a data mining project. So, you can think
of CRISP-DM as a guideline. CRISP-DM is a
combined effort of different institutions in an
EU project in the late 1990s. it breaks the whole
data mining process into six phases.

Business understanding:
- Understanding the business goal
- Situation assessment
- Translating the business goal to a data
mining objective
- Development of a project plan
Data understanding:
- Considering data requirements
- Initial data collection, exploration and quality assessment

Data preparation:
- Selection of required data
- Data cleaning → this is very important for the modeling phase
- Data transformation and enrichment

,Modeling:
- Selection of the appropriate modelling technique
- Training and test set creation for evaluation
- Development and examination of alternative modeling algorithms
- Fine tuning the model parameters

Evaluation:
- Evaluation of the model in the context of the business success criteria
- Model approval

Deployment:
- Reporting of the findings
- Planning and development of deployment procedure
- Deployment of the model
- Development of a maintenance or update plan
- Review the project and planning the next steps

Approaches followed CRISP-DM: CRISP-DM was followed by other approaches from
different companies:
- IBM introduced ASUM-DM
- SAS introduced SEMMA
- Microsoft introduced TDSP (Team Data Science Process)
They can be considered as updated versions of CRISP-DM based on new demands.

Team Data Science Process (TDSP): Microsoft’s new version of CRISP-DM.
key components:
- A data science Lifecyle definition
- A standardized project structure
- Infrastructure and resources recommended for data science projects
- Tools and utilities recommended for project execution




Lots of similarities with CRISP-DM, but also more details.

, TDSP provides recommendations for managing shared analytics and storage infrastructure
such as:
- Cloud file systems for storing datasets
- Databases
- Big Data (SQL or Spark) clusters
- Machine learning service
TDSP provides recommendations for R and Python.




Capture real data form the real world, process the data, clean it, then the explanatory data
analysis is done if more data is needed go back to step one, otherwise the modeling phase
starts.




One of the most important steps is asking the question of what data needs to be collected
or reported. This can be form different topics and different sources. Data is processed and
cleaned and along the way a data scientist formulates a hypothesis which is extremely
crucial. We need to know which direction we should go.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper liekebuuron. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €3,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 56326 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€3,99  1x  verkocht
  • (0)
In winkelwagen
Toegevoegd