100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Methodology for Premasters DSS, 800884-B-6 €6,49   In winkelwagen

Samenvatting

Summary Methodology for Premasters DSS, 800884-B-6

 28 keer bekeken  0 keer verkocht

Detailed summary of all lectures and additional notes, explanations and examplesof the coure "Methodology for Premasters DSS" at Tilburg University which is part of the Pre-Master Data Science and Society. The course was given by B. Nicenboim and G. Saygili in the first semester of the academic yea...

[Meer zien]

Voorbeeld 5 van de 46  pagina's

  • 21 juni 2022
  • 46
  • 2021/2022
  • Samenvatting
Alle documenten voor dit vak (3)
avatar-seller
hannahgruber
Tilburg University
Study Program: Pre-Master Data Science and Society
Academic Year 2021/2022, Semester 1 (August to December 2021)


Course: Methodology for Premasters DSS, 800884-B-6
Lecturers: B. Nicenboim and G. Saygili

,2. The data science process


Data Science Introduction
• Data Science is the art of turning data into actions
• the terminology exists since 1960s
• the term data science as the current understanding was introduced in 1990s in
statistics and data mining communities
• It was first named as an independent discipline in 2001
• data consists of
o 1) structured data → traditional relational database management system
o 2) unstructured data
o 3) semi structured data


Data Science in Organizations
• organizations use data to maintain competitiveness
o increase of data usability by 10 % leads to
→ 17 - 49 % increase in their productivity
→ 11 - 42 % return on assets
→ 5 - 6 % performance improvement via data driven decision making
• Big Data Opportunities
o Top 3: Increasing Operational efficiency (51%), Informing Strategic Direction,
Better Customer Service (27%)
• Four key activities of data science in organizations
o acquire: obtaining needed data
o prepare: preprocessing operations
on the data
o analyze: analyzing and interpret results
o act: taking actions based on results


Data Mining Process
• Data Mining is the process of extracting previously unknown and potentially useful
information from the data using mathematical, statistical and machine learning
methods
• CRISP-DM: Cross-Industry Standard Process for Data Mining (late 1990s)
o guideline for a structured approach to execute a data mining process
o six phases (the whole process can restart several times)

, o updated versions of CRISP-DM based on new demands
▪ IBM: ASUM-DM (Analytics
Solutions Unified Method
for Data Mining)
▪ SAS: SEMMA (Sampling,
Exploring, Modifying,
Modeling, Assessing)
▪ Microsoft: TDSP (Team
Data Science Process)
• Phases of CRISP-DM
o 1) Business Understanding
▪ Understanding the
business goal
▪ Situation assessment
▪ Translating the business goal to a data mining objective
▪ Development of a project plan
o 2) Data Understanding
▪ Considering data requirements
▪ Initial data collection, exploration, and quality assessment
o 3) Data Preparation
▪ Selection of required data
close
▪ Data cleaning
dependency
▪ Data transformation and enrichment
o 4) Modeling
▪ Selection of the appropriate modeling technique
▪ Training and test set creation for evaluation
▪ Development and examination of alternative modeling algorithms
▪ Fine tuning the model parameters
o 5) Model Evaluation
▪ Evaluation of the model in the context of the business success criteria
▪ Model approval
o 6) Deployment
▪ Reporting of the findings
▪ Planning and development of deployment procedure
▪ Deployment of the model
▪ Development of a maintenance or update plan
▪ Review of the project and planning the next steps

,• Team Data Science Process (TDSP)
o TDSP is Microsoft’s new version of CRISP-DM
o Key components:
▪ A data science lifecycle definition
▪ A standardized project structure
▪ Infrastructure and resources recommended for data science projects
▪ Tools and utilities recommended for project execution.




o TDSP Infrastructure and Resources for Data Science Projects: TDSP provides
recommendations for managing shared analytics and storage infrastructure
such as:
▪ Cloud file systems for storing datasets
▪ Databases
▪ Big Data (SQL or Spark) clusters
▪ Machine learning service
o TDSP provides recommendations for R and Python
• Data Science Trajectory / Process
o Raw Data → Process Data → Clean Data (suppress noise, add missing data)
o from exploratory data analysis you can go back to get more data or proceed
o data product can be used in real world

, • A Data Scientist’s Role in This Process
o Initial step: What data is needed?
o Data can come from different fields
o raw data can be from different type of data
o process and clean data to suppress noise and discard outliers
o data scientists formulate a hypothesis and research question which should be
answered within a study → we need to know where we are going to




Big Data
• big data combines different sources of data (e.g., social media, transactions,
enterprise data…)
• Big data can be defined as a collection of diverse and large amounts of data that is
hard to process with conventional data processing platforms.
→ Doug Laney’s explanation: big data = 3 V’s (volume, velocity, variety) plus value
→ big data = high volume, high velocity, and high variety of information
o “Data are becoming the new raw material of business: Economic input is
almost equivalent to capital and labor” (Economist, 2010)
o “Information will be the “21th Century Oil”” (Gartner company, 2019)

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper hannahgruber. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 64438 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€6,49
  • (0)
  Kopen