SUMMARY Data Preparation & Workflow Management (Dprep)
25 views 1 purchase
Course
Data Preparation & Workflow Management
Institution
Tilburg University (UVT)
This summary is written for the course “Data Preparation & Workflow Management” during the semester Spring-2022 and is part of the master Marketing Analytics. The input for this summary consists of lectures, articles and tutorials.
Disclaimer: The course “Data Preparation & Workflow Manag...
Demi van de Pol | Summary | Data Preparation & Workflow Management | TISEM | Tilburg University | Spring-2022
SUMMARY DATA
PREPARATION & WORKFLOW
MANAGEMENT
Demi van de Pol || Master Marketing Analytics || Tilburg University || 2022
1
, Demi van de Pol | Summary | Data Preparation & Workflow Management | TISEM | Tilburg University | Spring-2022
CONTENT
This summary is written for the course “Data Preparation & Workflow Management” during the
semester Spring-2022 and is part of the master Marketing Analytics. The input for this summary
consists of lectures, articles and tutorials.
Disclaimer: The course “Data Preparation & Workflow Management” is mainly focused on the practical part of this subject
(i.e. working with data). This summary is by no means a substitute for the lectures and tutorials provided by the lecturer.
This summary merely provides support on the theoretical part of the course.
…
WEEK 1
READING: Professionalize you Team Work Using Scrum
The entire article can be found via this link: https://tilburgsciencehub.com/tutorials/scale-up/scrum-for-
researchers/use-scrum-in-your-team/
● Scrum is a simple framework for effective team collaboration that provides structure which leads
to commitment and motivation.
● Scrum defines three main roles for members of the team: the product owner, the Scrum master
and development team members.
● The product owner is accountable for maximizing the value of the product and for defining a clear
“task list” (called product backlog).
● The Scrum master is accountable for the team’s effectiveness by coaching and helping the team
members to focus, removing obstacles for the team and ensuring that tasks are completed in a
positive, productive and timely manner.
● The development team members are responsible for completing the tasks in the Sprint (period).
● Scrum can be seen as a structured way of working with meetings that are shorter and more
productive, and cooperating in a flexible way in-between meetings.
2
, Demi van de Pol | Summary | Data Preparation & Workflow Management | TISEM | Tilburg University | Spring-2022
WEEK 2: Project Management &
Version Control
READING: Principles of Project Setup and Workflow Management
The entire article can be found via this link: https://tilburgsciencehub.com/tutorials/reproducible-research-
and-automation/principles-of-project-setup-and-workflow-management/project-setup-overview/
PROJECT SETUP
Two major issues in managing data-intensive projects are:
● Losing sights of the project (= directory and file chaos)
● Difficult to (re)execute the project (= lack of automation)
The primary mission of managing data- and computation-intensive projects is to build a transparent
project infrastructure, that allows for easily (re)executing your code potentially many times.
PIPELINES AND PROJECT COMPONENTS
It is useful to break down a project into its most basic parts:
● A pipeline refers to the steps that are necessary to build a project (e.g., prepare dataset, run
model, produce tables and figures).
● Components refer to a project’s most nuclear building blocks (e.g., data, source code, and
generated temporary and/or output files).
The power of setting up the project in this way lies in:
● Full portability
● Reproducibility and transparency
PIPELINES
Benefits of conceiving your project like a pipeline:
● Write clearer source code: Separate the different steps in your project in smaller steps of separate
source code files.
● Obtain results faster: Because your project is separated into different pipeline stages and each of
these stages is self-contained, you can easily run “later” stages of your project (called
“downstream”), based on different input files defined earlier in your project (called “upstream”).
● Increase transparency and foster collaboration: With more transparent source code, you allow
others to more easily understand the code you use(d).
● Use multiple software packages: Due to the smaller steps you can easily use for instance R to
prepare your dataset and Python to build an algorithm based on the cleaned data.
3
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller demivandepolxxx. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $5.42. You're not tied to anything after your purchase.