The best summary of ALL LECTURES for Data Engineering for MADS (EBM213A05). Enhanced with a dynamic table of contents and meticulous organization for readability and easy studying. 100% of profit from this summary is donated to local Groningen NGOs, as well as national ones.
Answers for Quiz 1, Quiz 5, Quiz 6 and a Full Practice Exam for Data Engineering for MADS (EBM213A05)
Extensive Summary of Verhoef book (Chapter 1-5) + Lecture 2 for Data Engineering for MADS (EBM213A05)
Readings Summary for Data Engineering for MADS (Mandatory & Optional Papers + Book Chapters)
All for this textbook (5)
Written for
Rijksuniversiteit Groningen (RuG)
Marketing Analytics & Data Science
Data Engineering for MADS (EBM213A05)
All documents for this subject (5)
Seller
Follow
madsmadlad
Reviews received
Content preview
SUMMARY OF ALL LECTURES
Bonus: Week 7 lecture substituted for more extensive material.
E n h a n c e d w i t h a d y n a m ic t a b le o f c o n t e n t s .
, MADS MADLAD |2
Note from MADS MADLAD:
Thank you for buying my summary. I sincerely hope it helps you excel and learn
from this course. When I was writing these I sometimes struggled with this
program, but there were no summaries available.
This is why I decided to write something that is truly complete with a lot of
effort put into it. It helped me and my friends get good grades, but I also
always had you in mind, the future reader. When necessary, I always went the
extra mile to make this summary, more readable, organized and complete.
If you feel like it, leave me a review of how the course is going using this
summary, it will make my day to hear your opinion good or bad!
Check out my other extensive summaries for other MADS courses:
Available on:
Contact info:
If you need help or have an inquiry, contact me: https://www.georgedreemer.com
Connect with me on LinkedIn: https://www.linkedin.com/in/georgedreemer/
Donations:
By no means am I looking for fellow students to send me money! But if you feel like sending
me some ETH or BTC, you can do so here:
--> ETH: 0x123e086c6808459e7fC6Ac7F64a77dBA1dDe0149
--> BTC: bc1qgwzc82vph5v8rmzef4ywechjf85772n7m2e22g
, MADS MADLAD |3
wishes you good luck & perseverance.
Grades Testimony:
, MADS MADLAD |4
Table of Contents
Week 1............................................................................................................... 8
Lecture 0 (Intro) & 1.1 (MD->MQ->RQ) & HBR article ..................................... 8
Management Dilemma and Questions, Research Questions ....................... 8
Management Dilemma........................................................................... 8
From Management Dilemma to Management Questions: ..................... 9
Management Question defined: ............................................................ 9
From Management Question to Research Question .............................. 9
Management Question vs. Research Question:.................................... 10
From Research Question to Analysis Questions: .................................. 10
7 steps of the opportunity tree ............................................................ 10
Lecture 1.2 - supplement (Wehkamp lecture 2022) ...................................... 13
7 Elements of a Data Strategy .................................................................... 13
Data Warehouse vs. Data Lake .................................................................. 13
4 Ways to work with Data .......................................................................... 15
Data Science – mix of 3 fields of knowledge .............................................. 15
Domain Expertise (Business) ................................................................ 16
Technical Data Engineering .................................................................. 16
Math & Statistical Knowledge .............................................................. 16
Week 2............................................................................................................. 18
Lecture 2 – From ‘raw’ data to your own table: conceptual introduction ..... 18
Our first choice: level of analysis (aggregation level) ................................. 18
Four Types of Information ......................................................................... 19
Joining tables, aggregation, adding, external data ..................................... 19
INNER_JOIN.......................................................................................... 19
LEFT_JOIN /RIGHT_JOIN ....................................................................... 19
FULL_OUTER_JOIN ............................................................................... 20
Week 3............................................................................................................. 25
Lecture – Working with SQL, Azure Data, Creating our own table: From ‘raw’
data to your own table with SQL................................................................... 25
, MADS MADLAD |5
Every SQL query to retrieve data consists of maximum 6 SQL instructions 25
Week 4............................................................................................................. 26
Lecture – Combining Wehkamp data with external data in RStudio ............. 26
Two dimensions: Data source & Data type ................................................ 26
Sources for external data ........................................................................... 26
Integrating external data sources .............................................................. 28
Useful commands in R Studio .................................................................... 28
Week 5............................................................................................................. 29
Lecture 5 – Data cleaning, outliers, missing data (1/2) ................................. 29
Data Quality ............................................................................................... 29
Data Cleaning............................................................................................. 29
Steps in Cleansing (lecture) .................................................................. 30
Parsing (1) ........................................................................................ 30
Correcting (2) ................................................................................... 31
Standardizing (3) .............................................................................. 31
Matching (4) ..................................................................................... 32
Consolidating (5) .............................................................................. 32
Formatting the data (Tidy Data) ................................................................. 33
Packages for Tidying Data .................................................................... 33
After Data Cleaning: Sanity Check .............................................................. 34
Outliers ...................................................................................................... 34
What are outliers? ............................................................................... 34
Why do we address outliers? ............................................................... 34
Detecting Outliers ................................................................................ 35
Treating Outliers .................................................................................. 36
Missing Data .............................................................................................. 37
MCAR, MAR, MNAR – Graphical representation (lecture) ................. 38
MCAR, MAR, MNAR: Academic & Simple explanation (ext. source) ..... 39
Lecture 5 – Detecting outliers/Mahalanobis distance, Multiple imputation
(2/2).............................................................................................................. 40
, MADS MADLAD |6
Detecting outliers/Mahalanobis Distance .................................................. 40
How to do it in R: ................................................................................. 43
Imputation ................................................................................................. 45
How to do it in R................................................................................... 46
Week 6............................................................................................................. 50
Lecture 6 – The Power of First Insights ......................................................... 50
DESCRIPTIVE ANALYSES: REPORTING ......................................................... 50
Typical Questions of Descriptive Analyses ............................................ 50
Using min, max, mean, median, totals is very useful ............................ 50
Useful graphics: histograms & boxplots ............................................... 51
DESCRIPTIVE ANALYSES: 1-TO-1 RELATIONSHIPS ....................................... 51
Chi-Square Test: Categorical KPI x Categorical drivers .......................... 51
Overview of tests: ............................................................................ 53
Decision Tree for Choosing a Test: ANOVA vs. t-test ........................ 53
Numerical KPI x Categorical levels ........................................................ 54
DYNAMIC ANALYTICS ................................................................................. 55
Trend Analysis ...................................................................................... 56
Migration Analysis ................................................................................ 56
Like-4-Like Analysis .............................................................................. 56
Week 7............................................................................................................. 57
Reading – Book: Chapter 10 .......................................................................... 57
Chapter 10: Creating Impact with Storytelling and Visualization .................. 57
10.1 INTRODUCTION .................................................................................. 57
10.2 FAILURE FACTORS FOR CREATING IMPACT ........................................ 58
10.3 STORYTELLING ................................................................................... 58
10.3.1 CHECKLIST FOR A CLEAR STORYLINE .......................................... 60
10.4 VISUALIZATION .................................................................................. 60
10.4.1 CHOOSING THE CHART TYPE ..................................................... 60
10.4.1.1 SHOWING RELATIONSHIP BETWEEN DATA POINTS ............ 60
10.4.1.2. COMPARING DATA POINTS ................................................ 61
, MADS MADLAD |7
10.4.1.3 COMPOSITION .................................................................... 62
10.4.1.4 DISTRIBUTION..................................................................... 62
DECISION PROCESS FOR CHARTS (ABELA, 2008) ........................................ 63
10.4 MISLEADING GRAPHS......................................................................... 64
10.4.3.1 TRUNCATED GRAPHS .......................................................... 64
10.4.3.2 ADJUSTED AXIS ................................................................... 64
10.4.3.3 INCORRECT SCALING .......................................................... 65
10.4.3.4 LOGARITHMIC SCALING ...................................................... 65
10.4.3.5 OMITTING DATA ................................................................. 66
10.4.3.6 SIMULATED TRENDS ........................................................... 66
10.4.3.7 REDUNDANT 3D PERSPECTIVE ............................................ 67
10.5 TRENDS IN VISUALIZATION ................................................................ 67
10.6 CONCLUSIONS .................................................................................... 67
Reading – Berinato, S (2016) ......................................................................... 68
Conceptual or Data Driven ......................................................................... 68
Declarative or Exploratory ......................................................................... 68
The 4 Types of Visual Communication ....................................................... 68
Idea Illustration .................................................................................... 69
Idea Generation ................................................................................... 70
Visual Discovery ................................................................................... 71
Everyday Datawiz ................................................................................. 72
Reading – Cleveland et al. (1984) .................................................................. 73
Reading – Swamy, P.R. (2013): Building Logic Into Communication Using the
Minto Pyramid Principle ............................................................................... 74
Storing and Retrieving Information............................................................ 74
Barbara Minto’s Pyramid Principle ............................................................ 75
Structure of the Pyramid Principle ....................................................... 76
Introductory Statement.................................................................... 76
Advantages of using the Minto pyramid principle ............................ 78
The 5 Phases of the Analytical Cycle entail to:
(1) define and structure the business challenge
(2) collect and manipulate the data
(3) perform analysis
(4) present opportunities and solutions
(5) implementation of results
Management Dilemma and Questions, Research Questions
Management Dilemma: a symptom of an underlying problem.
– Profits are decreasing
– Target of improving shareholder value by 5% should be met
– Marketing efforts lack effectiveness
– Increasing handle time at the Customer Service Desk
– Underperformance of several sales force teams
– Unanticipated sudden increases in demand
, MADS MADLAD |9
From Management Dilemma to Management Questions:
- Discussion with relevant stakeholders
- Interviews with industry experts
- Secondary data analysis
Goal: restate the MD in terms of underlying problem.
Usually starting with:
- Should we…? (choice of purpose)
- How can we…?
- Why do we…?
Management Question defined:
- Management dilemma restated in question form
- Defined in terms of the underlying problem
- Preferably linked to a Key Performance Indicator (KPI)
- Does not specify the research that needs to be done
- Questions are still broad
Example questions:
- What should be done to increase conversion?
- Should we use new and promising advertising channels to
improve marketing ROI?
- Why is the number of new contracts closed by several sales
force teams lower than expected?
From Management Question to Research Question (example):
Let’s assume the Management Question is:
- How can we increase sales in the Northern region this year?
Possible Research Questions:
- What causes decrease in the sales of the Northern region?
- What is the effect of the company-wide pricing strategy on the
Northern region?
- Management Question defined:
, M A D S M A D L A D | 10
Management Question vs. Research Question:
From Research Question to Analysis Questions:
Using the 5W’s of the opportunity finding -method helps you to come
up with a good set of analysis questions.
- Who?
- What?
- Where?
- When?
- Why?
7 steps of the opportunity tree
1. Business challenge: the starting point of the tree, defined in
measurable objectives.
2. Sub-questions: translate business challenge into sub-
questions.
3. Factors: define which levers you can influence or use.
4. Hypotheses: make a ‘braindump’ of all possible hypotheses.
----- exhaustive opportunity tree includes these -----
5. Insights: determine the analyses questions to check the
hypotheses and to identify areas with high potential.
6. Initiatives: come up with potential initiatives to realize the
targets/objectives.
7. Impact: calculate the monetary impact (+ or -) of initiatives and
identify the most promising ones.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller madsmadlad. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $10.67. You're not tied to anything after your purchase.