This summary consists of all the theory given in the lectures of the Data Analytics for Engineers course. Additionally, it also consists of examples to help you understand the subject matter. This summary will provide you with the necessary basics for the exam!
Lieve Göbbels
Data Analytics (2IAB0)
Semester 2, 2020-2020
Data Analytics for Engineers
Exploratory Data Analysis (EDA) 2
Types of data 2
Elementary statistical plots 3
Summary statistics 3
Advanced statistical plots 4
Data Visualization (VIS) 6
Visualization 6
Colors and color de ciency 7
Idioms 8
Checklist for effective visualization 9
Data Mining Methods (DMM) 10
Data mining 10
Four methods 10
Linear model quality 11
K-Means clustering 11
Distances 12
Decision tree quality 12
Support of an item set (association rules) 12
Data Organization and Queries (ORG) 13
Data organization 13
SQL 14
Data Aggregation and Sampling (DAS) 16
Important concepts 16
Measurements and sampling 17
Data cleaning and ltering 17
Hypothesis Formulation and Testing (HYP) 19
Distributions 19
Metrics 19
Con dence intervals and hypothesis testing 20
, Exploratory Data Analysis (EDA)
In short:
• Types of data
• Elementary statistical plots
• Summary statistics
• Advanced statistical plots
Types of data
Important concepts
Data = raw, unorganized numbers, facts, etcetera
information = structured, meaningful and useful numbers and facts
Data types
• numerical = data that has intrinsic numerical value
‣ continuous = data that can attain any value on a given measurement scale
- interval = no xed zero point; only di erences have meaning
- ratio = xed zero point; ratios have meaning
‣ discrete = can only attain a nite number of values
• categorical = no intrinsic numerical value
‣ nominal = two or more outcomes that have no natural order
‣ ordinal = outcomes that have a natural order; sequential, diverging or cyclic
Examples:
• Temperature is interval data since 20°C ≠ 2 × 10°C
• length is ratio data since 20m = 2 × 10m
• categorical data (e.g. ratings) are sometimes labeled with numbers, but these numbers are meaningless,
so they are not numerical data
Tables
Tables are good for reading values and to draw attention to actual values. There are two kinds of tables:
• reference table = store all data in a table so that is can be looked up easily;
• demonstration table = a table to illustrate a point.
Key features of EDA
• getting to know the data before further analysis;
• extensively using plots;
• generating questions;
• detecting errors (what are reasonable values?; given one value, what could be the others?).
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Lieve12. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.24. You're not tied to anything after your purchase.