100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Lecture notes Data science for Business €5,49
In winkelwagen

College aantekeningen

Lecture notes Data science for Business

 129 keer bekeken  7 keer verkocht

This is a 42 page summary of the lecture notes of the course data science, digital business MSc. UvA. Including visuals

Voorbeeld 4 van de 42  pagina's

  • 3 april 2020
  • 42
  • 2019/2020
  • College aantekeningen
  • Chintan amrit
  • Alle colleges
Alle documenten voor dit vak (2)
avatar-seller
sabinedejong96
Data Science for Business – Lecture notes

DSB: why? 3 challenges:
1. • Business and IT- people come from different cultures, have different interests, talk “different”
languages, etc.
• We need professionals that understand both worlds and can bridge the gap between these worlds
• the managers need to understand the fundamentals of data science to effectively leverage a data
science team for making better decisions.

2.We have data…. Lot’s of data

3. Organisations are increasingly complex due to the fact they operate in complex supply chains and
business networks, they change fast and often, their environments changes as well.
→ One of the ways to manage this complexity is to use data analysis and machine learning to beat
the competition


Data science: combination of statistics, machine learning and databases

Knowledge discovery process (KDP)




→ data analytics method

ETL = extract, transform and load
70% of the time is spend on transforming data usually
once transformed, you can look at the patterns from the data → done with Rapidminer in this course

statistics, machine learning and data mining

statistics:

• More theory based
• More model based
• More focused on testing hypotheses
• Top-down approach
• Explanatory model → cannot predict

Machine learning:
• More heuristic
• Focused on improving performance of a learning agent

, • Also look at real-time learning and robotics – areas not part of data mining
→ born from computer scientists, defining data and trying to analyze it
• Bottom-up approach: look at the data, try to see patterns etc and then come up with a
model
• Predictive model: predict the future

Data mining and knowledge
• Integrated theory and heuristics
• Focus on the entire process of knowledge discovery, including data cleaning, learning, and
integration and visualization of results

• Distinctions between the 3 is fuzzy

Data mining versus….
• Data warehousing/storage
o Data warehouses coalesce data from across an enterprise, often from multiple
transaction-processing systems
▪ Database: current data for transactions etc. (for a certain month)
→ excel not an important skill – you need to have skills in SQL
Excel – gives only few dimensions per sheet → you need to make a data
warehouse in order to have all features.
• Querying / Reporting (SQL, Excel, QBE, other GUI-based querying)
o Very flexible interface to ask factual questions about data
o No modeling or sophisticated pattern finding
o Most of the cool visualizations
• OLAP – On-line Analytical Processing
o OLAP provides easy-to-use GUI to explore large data collections
o Exploration is manual; no modeling
o Dimensions of analysis preprogrammed into OLAP system

Datawarehouse vs database:
data warehouse = OLAP. You have historical data, which you are processing → database is just a
technical system which handles current customer transactions

o This course; Jump straight to data mining as opposed to understanding the data warehouses
as there are people in companies taking care of this → to complex to understand quickly

types of machine learning
1. Supervised learning:
data as a baby or dog: you teach the baby things, and you expect the baby to learn it and remember
it → your train the algorithm and hopefully learns from the data
2. unsupervised learning: you give the data to the algorithms and in finds it ways through and
discovers itself
3. reinforcement learning: trying to learn from that data again in a loop

Two types of supervised learning:

, 1. Classification: classifying whether a student passes or fail
2. Regression: you want to know the average number a student actually scores (not simply a
pass/fail → but you want an actually number)

Terminology
attributes/ features (variables in statistics)
target attribute: the last thing you would like to predict




dimensionality = number of dimensions (features/attributes) of the dataset added together
→ the more of these you have, the higher the dimensionality is of you dataset –
more dimensions = harder to analyze, so you want to reduce this to make it easier

Data mining
Data Mining Tasks: Classification
Learn a method for predicting the instance class from pre-labeled (classified) instances
Many approaches:
- Statistics
- Decision Trees
- Neural Networks

data in data mining
Need to know the types of data before analyzing:

• Categorical – binomial data (pass/fail)
o Nominal data: will it rain or not
o Ordinal: you know that a class is better than another → and therefore it becomes
more of a ranking in your data
• Numerical:
o Interval: when the 0 point is not fixed – (you can only add or subtract) - temperature

, o Ratio: the 0 point is fixed (you can divide) – height, weight etc
How does data mining/ML work?
DM extracts patterns from data
Pattern = A mathematical (numeric and/or symbolic) relationship among data items

Types of patterns
• Association
• Prediction
• Cluster (segmentation)
• Sequential (or time series) relationships

Common data mining tasks




→ most of these can be supervised and unsupervised – you need to know more about the method in
order to make these statements.




Knowledge discovery process flow, according to CRISP-DM
• Business Understanding + Data Understanding + Data Preparation 80% of the time

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper sabinedejong96. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 53068 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€5,49  7x  verkocht
  • (0)
In winkelwagen
Toegevoegd