100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Lecture Summary for Data Engineering for MADS €9,99
In winkelwagen

Samenvatting

Lecture Summary for Data Engineering for MADS

 18 keer bekeken  0 keer verkocht

The best summary of ALL LECTURES for Data Engineering for MADS (EBM213A05). Enhanced with a dynamic table of contents and meticulous organization for readability and easy studying. 100% of profit from this summary is donated to local Groningen NGOs, as well as national ones.

Voorbeeld 10 van de 78  pagina's

  • Nee
  • Chapter 1, chapter 2.1-2.5, chapter 3.4, 3.9, chapter 4 and chapter 5
  • 5 februari 2023
  • 78
  • 2022/2023
  • Samenvatting
book image

Titel boek:

Auteur(s):

  • Uitgave:
  • ISBN:
  • Druk:
Alle documenten voor dit vak (5)
avatar-seller
madsmadlad
SUMMARY OF ALL LECTURES
Bonus: Week 7 lecture substituted for more extensive material.

E n h a n c e d w i t h a d y n a m ic t a b le o f c o n t e n t s .

, MADS MADLAD |2



Note from MADS MADLAD:

Thank you for buying my summary. I sincerely hope it helps you excel and learn
from this course. When I was writing these I sometimes struggled with this
program, but there were no summaries available.
This is why I decided to write something that is truly complete with a lot of
effort put into it. It helped me and my friends get good grades, but I also
always had you in mind, the future reader. When necessary, I always went the
extra mile to make this summary, more readable, organized and complete.
If you feel like it, leave me a review of how the course is going using this
summary, it will make my day to hear your opinion good or bad!


Check out my other extensive summaries for other MADS courses:




Available on:



Contact info:
If you need help or have an inquiry, contact me: https://www.georgedreemer.com
Connect with me on LinkedIn: https://www.linkedin.com/in/georgedreemer/
Donations:
By no means am I looking for fellow students to send me money! But if you feel like sending
me some ETH or BTC, you can do so here:
--> ETH: 0x123e086c6808459e7fC6Ac7F64a77dBA1dDe0149
--> BTC: bc1qgwzc82vph5v8rmzef4ywechjf85772n7m2e22g

, MADS MADLAD |3



wishes you good luck & perseverance.




Grades Testimony:

, MADS MADLAD |4

Table of Contents
Week 1............................................................................................................... 8
Lecture 0 (Intro) & 1.1 (MD->MQ->RQ) & HBR article ..................................... 8
Management Dilemma and Questions, Research Questions ....................... 8
Management Dilemma........................................................................... 8
From Management Dilemma to Management Questions: ..................... 9
Management Question defined: ............................................................ 9
From Management Question to Research Question .............................. 9
Management Question vs. Research Question:.................................... 10
From Research Question to Analysis Questions: .................................. 10
7 steps of the opportunity tree ............................................................ 10
Lecture 1.2 - supplement (Wehkamp lecture 2022) ...................................... 13
7 Elements of a Data Strategy .................................................................... 13
Data Warehouse vs. Data Lake .................................................................. 13
4 Ways to work with Data .......................................................................... 15
Data Science – mix of 3 fields of knowledge .............................................. 15
Domain Expertise (Business) ................................................................ 16
Technical Data Engineering .................................................................. 16
Math & Statistical Knowledge .............................................................. 16
Week 2............................................................................................................. 18
Lecture 2 – From ‘raw’ data to your own table: conceptual introduction ..... 18
Our first choice: level of analysis (aggregation level) ................................. 18
Four Types of Information ......................................................................... 19
Joining tables, aggregation, adding, external data ..................................... 19
INNER_JOIN.......................................................................................... 19
LEFT_JOIN /RIGHT_JOIN ....................................................................... 19
FULL_OUTER_JOIN ............................................................................... 20
Week 3............................................................................................................. 25
Lecture – Working with SQL, Azure Data, Creating our own table: From ‘raw’
data to your own table with SQL................................................................... 25

, MADS MADLAD |5

Every SQL query to retrieve data consists of maximum 6 SQL instructions 25
Week 4............................................................................................................. 26
Lecture – Combining Wehkamp data with external data in RStudio ............. 26
Two dimensions: Data source & Data type ................................................ 26
Sources for external data ........................................................................... 26
Integrating external data sources .............................................................. 28
Useful commands in R Studio .................................................................... 28
Week 5............................................................................................................. 29
Lecture 5 – Data cleaning, outliers, missing data (1/2) ................................. 29
Data Quality ............................................................................................... 29
Data Cleaning............................................................................................. 29
Steps in Cleansing (lecture) .................................................................. 30
Parsing (1) ........................................................................................ 30
Correcting (2) ................................................................................... 31
Standardizing (3) .............................................................................. 31
Matching (4) ..................................................................................... 32
Consolidating (5) .............................................................................. 32
Formatting the data (Tidy Data) ................................................................. 33
Packages for Tidying Data .................................................................... 33
After Data Cleaning: Sanity Check .............................................................. 34
Outliers ...................................................................................................... 34
What are outliers? ............................................................................... 34
Why do we address outliers? ............................................................... 34
Detecting Outliers ................................................................................ 35
Treating Outliers .................................................................................. 36
Missing Data .............................................................................................. 37
MCAR, MAR, MNAR – Graphical representation (lecture) ................. 38
MCAR, MAR, MNAR: Academic & Simple explanation (ext. source) ..... 39
Lecture 5 – Detecting outliers/Mahalanobis distance, Multiple imputation
(2/2).............................................................................................................. 40

, MADS MADLAD |6

Detecting outliers/Mahalanobis Distance .................................................. 40
How to do it in R: ................................................................................. 43
Imputation ................................................................................................. 45
How to do it in R................................................................................... 46
Week 6............................................................................................................. 50
Lecture 6 – The Power of First Insights ......................................................... 50
DESCRIPTIVE ANALYSES: REPORTING ......................................................... 50
Typical Questions of Descriptive Analyses ............................................ 50
Using min, max, mean, median, totals is very useful ............................ 50
Useful graphics: histograms & boxplots ............................................... 51
DESCRIPTIVE ANALYSES: 1-TO-1 RELATIONSHIPS ....................................... 51
Chi-Square Test: Categorical KPI x Categorical drivers .......................... 51
Overview of tests: ............................................................................ 53
Decision Tree for Choosing a Test: ANOVA vs. t-test ........................ 53
Numerical KPI x Categorical levels ........................................................ 54
DYNAMIC ANALYTICS ................................................................................. 55
Trend Analysis ...................................................................................... 56
Migration Analysis ................................................................................ 56
Like-4-Like Analysis .............................................................................. 56
Week 7............................................................................................................. 57
Reading – Book: Chapter 10 .......................................................................... 57
Chapter 10: Creating Impact with Storytelling and Visualization .................. 57
10.1 INTRODUCTION .................................................................................. 57
10.2 FAILURE FACTORS FOR CREATING IMPACT ........................................ 58
10.3 STORYTELLING ................................................................................... 58
10.3.1 CHECKLIST FOR A CLEAR STORYLINE .......................................... 60
10.4 VISUALIZATION .................................................................................. 60
10.4.1 CHOOSING THE CHART TYPE ..................................................... 60
10.4.1.1 SHOWING RELATIONSHIP BETWEEN DATA POINTS ............ 60
10.4.1.2. COMPARING DATA POINTS ................................................ 61

, MADS MADLAD |7

10.4.1.3 COMPOSITION .................................................................... 62
10.4.1.4 DISTRIBUTION..................................................................... 62
DECISION PROCESS FOR CHARTS (ABELA, 2008) ........................................ 63
10.4 MISLEADING GRAPHS......................................................................... 64
10.4.3.1 TRUNCATED GRAPHS .......................................................... 64
10.4.3.2 ADJUSTED AXIS ................................................................... 64
10.4.3.3 INCORRECT SCALING .......................................................... 65
10.4.3.4 LOGARITHMIC SCALING ...................................................... 65
10.4.3.5 OMITTING DATA ................................................................. 66
10.4.3.6 SIMULATED TRENDS ........................................................... 66
10.4.3.7 REDUNDANT 3D PERSPECTIVE ............................................ 67
10.5 TRENDS IN VISUALIZATION ................................................................ 67
10.6 CONCLUSIONS .................................................................................... 67
Reading – Berinato, S (2016) ......................................................................... 68
Conceptual or Data Driven ......................................................................... 68
Declarative or Exploratory ......................................................................... 68
The 4 Types of Visual Communication ....................................................... 68
Idea Illustration .................................................................................... 69
Idea Generation ................................................................................... 70
Visual Discovery ................................................................................... 71
Everyday Datawiz ................................................................................. 72
Reading – Cleveland et al. (1984) .................................................................. 73
Reading – Swamy, P.R. (2013): Building Logic Into Communication Using the
Minto Pyramid Principle ............................................................................... 74
Storing and Retrieving Information............................................................ 74
Barbara Minto’s Pyramid Principle ............................................................ 75
Structure of the Pyramid Principle ....................................................... 76
Introductory Statement.................................................................... 76
Advantages of using the Minto pyramid principle ............................ 78

, MADS MADLAD |8

Week 1
Lecture 0 (Intro) & 1.1 (MD->MQ->RQ) & HBR article




The 5 Phases of the Analytical Cycle entail to:
(1) define and structure the business challenge
(2) collect and manipulate the data
(3) perform analysis
(4) present opportunities and solutions
(5) implementation of results
Management Dilemma and Questions, Research Questions
Management Dilemma: a symptom of an underlying problem.
– Profits are decreasing
– Target of improving shareholder value by 5% should be met
– Marketing efforts lack effectiveness
– Increasing handle time at the Customer Service Desk
– Underperformance of several sales force teams
– Unanticipated sudden increases in demand

, MADS MADLAD |9

From Management Dilemma to Management Questions:
- Discussion with relevant stakeholders
- Interviews with industry experts
- Secondary data analysis
Goal: restate the MD in terms of underlying problem.
Usually starting with:
- Should we…? (choice of purpose)
- How can we…?
- Why do we…?


Management Question defined:
- Management dilemma restated in question form
- Defined in terms of the underlying problem
- Preferably linked to a Key Performance Indicator (KPI)
- Does not specify the research that needs to be done
- Questions are still broad
Example questions:
- What should be done to increase conversion?
- Should we use new and promising advertising channels to
improve marketing ROI?
- Why is the number of new contracts closed by several sales
force teams lower than expected?


From Management Question to Research Question (example):
Let’s assume the Management Question is:
- How can we increase sales in the Northern region this year?
Possible Research Questions:
- What causes decrease in the sales of the Northern region?
- What is the effect of the company-wide pricing strategy on the
Northern region?
- Management Question defined:

, M A D S M A D L A D | 10

Management Question vs. Research Question:




From Research Question to Analysis Questions:
Using the 5W’s of the opportunity finding -method helps you to come
up with a good set of analysis questions.
- Who?
- What?
- Where?
- When?
- Why?


7 steps of the opportunity tree
1. Business challenge: the starting point of the tree, defined in
measurable objectives.
2. Sub-questions: translate business challenge into sub-
questions.
3. Factors: define which levers you can influence or use.
4. Hypotheses: make a ‘braindump’ of all possible hypotheses.

----- exhaustive opportunity tree includes these -----

5. Insights: determine the analyses questions to check the
hypotheses and to identify areas with high potential.
6. Initiatives: come up with potential initiatives to realize the
targets/objectives.
7. Impact: calculate the monetary impact (+ or -) of initiatives and
identify the most promising ones.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper madsmadlad. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €9,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 59063 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen
€9,99
  • (0)
In winkelwagen
Toegevoegd