EXPLORATORY DATA ANALYSIS Shapes
Key features:
Getting to know the data Channels (appearance of marks):
Extensively using graphs Position
Generating questions Color
Detecting errors in data Length
Size
DATA TYPES Orientation
Categorical/qualitative
Dichotomous DISTRIBUTION SHAPES
Yes/no, male/female
Nominal (sometimes called categorical)
Movie genres (no ordening)
Ordinal (sometimes called ordered)
Movie ratings, good/better/enz.
(ordening)
Numerical/quantitative
Interval (no 0 point, only difference has
meaning) Right skewed: long tail to the right, mean bigger than
World ranking, release date median, median closer to 25th quartile
Ratio (0 or larger, meaningful to calculate with)
Height, weight, traffic accidents PLOTS AND VISUALIZATIONS
SCATTERPLOT
VISUALIZATION Data type: 2 numerical
Key features: Key or value: no keys, only values
Exploration, for you (finding patterns) Marks and channels: points, using positioned
Confirmation (seeing patterns that match your horizontal and vertical
expectations) Usage:
Communication, for others (simple, selected, Shows structure (not for large datasets)
summarized) Find trends, outliers, distribution, correlation
and clusters
TABLES OR GRAPHS?
Graphs:
For making comparisons
To get a quick view
Use to discover relations
Tables:
For reading of values
To draw attention to sizes
Reference table: store all data to lookup data
Demonstration table: to illustrate a point so present BAR CHART
just enough data Data type: 1 categorical, 1 quantitative
Key or value: 1 key, 1 value
KEYS AND VALUES Marks and channels: lines, using length and spatial
Key attribute: regions
Independent Usage:
Categorical Compare and lookup values
Value attribute:
Dependent
Numerical/ quantitative
MARKS AND CHANNELS
Marks:
Points
Lines
Areas
, STACKED BAR CHART KERNEL DENSITY PLOT
Data type: 2 categorical, 1 quantitative Data type: -
Key or value: 2 keys, 1 value Key or value: -
Marks and channels: lines, using length, spatial Marks and channels: -
regions and hue Usage:
Usage: No fixed bins so overcomes drawbacks of
Compare and lookup values histograms
Overview what the relation is to the whole Explore distribution shape
LINE CHART
Data type: 2 quantitative, never use categorical
Key or value: 1 key, 1 value
Marks and channels: points with line connections, CUMMULATIVE HISTOGRAM
using aligned lengths Data type: -
Usage: Key or value: -
Find trend and relationship between item and Marks and channels: -
next item Usage:
Illustrate thresholds
HEATMAP
Data type: 2 categorical, 1 quantitative
Key or value: -
Marks and channels: area, using color BOX PLOT
Usage: Data type: summary statistics
Find clusters and outliers Key or value: -
Marks and channels: -
Usage:
To compare groups
Find distributions (uses median, Q1, Q3, min
and max)
HISTOGRAM
Data type: numerical
Key or value: -
Marks and channels: -
Usage: VIOLIN PLOT
Shows distribution shape Data type: numerical
Key or value:-
Marks and channels:-
Usage:
Same as boxplot but also has kernel density
plot on each side which means they also show
the probability