Introduction lesson
1.1. Defining multivariate analysis
It refers to all statistical methods that simultaneously analyze multiple measurements on each
individual or object under investigation.
➔ Almost every real life marketing problem requires statistical analysis of several variables: you
need them in your toolkit!
➔ Crucial for Master Thesis:
o Translate marketing problem
o Collect data
o Analyze using R
1.2. Some basic concepts
1.2.1. Measurement Scales
- Nonmetric: nominal, ordinal
- Metric: interval, ratio
Nominal
- unique definition/identification classification
- (brand name, gender, favorite ice crem)
- %, mode, chi square tests
Ordinal
- Indicate ‘order’, sequence
- Preference ranking (gold, silver, bronze)
- Percentiles, median, rank correlation, mode, %, chi square tests
Interval
- No absolute 0 point
- 7-point likert scale
- Arithmetic average, range, standard deviation, product-moment correlation + previous
methods.
Ratio
- Absolute 0 point
- Age, cost, number of customers
- Geometric average, coefficient of variation + all previous methods
1.2.2. Errors: reliability and validity
- Reliability = is the measure consistent? = the degree to which multiple measurements give
the same results → test-retest
- Validity = does the measure capture the concept it is supposed to measure? = the degree to
which the scores of a measure represent the variable they are intended to
,1.2.3. Statistical Significance and Power
Hypothesis testing
= testing whether something is (different) from
0. For example “does advertising affect sales?”
You decide that there is a difference, but there is
none in reality → how can we make sure that
that problem becomes smaller? That the risk
that we conclude there is a difference, when in
reality there is not? → You want to set a cutoff
(you need your measure to be higher than your
cutoff)
- You conclude that something is different, but in reality it is not = type 1 error (false positive)
- You conclude that there is no difference, but in reality there is = type 2 error
- We are trying to reduce both types of mistakes, the way we are going to do that:
o Allow type 1 error to 5% (alpha 0.05)
o Live with the fact that we can make a type 2 error
Power
Power depends on:
- Alpha (a) → (+) = if you are willing to accept a higher type 1 error, the power will be higher
- Effect size → (+) = if a difference is bigger in reality, you have a higher chance of finding that
difference in your test
- Sample size (n) → (+) = if you look at bigger sample sizes, your test will have a higher power
Implications:
- Anticipate consequences of alpha, effect and n
- Assess/incorporate power when interpreting results
1.3. Types of Multivariate Methods
Dependence Techniques
- 1 or more variables can be identified as dependent variables and the remaining as
independent variables
- Choice of dependence technique depends on the number of dependent variables involved in
analysis
Interdependence Techniques
- Whole set of interdependent relationships is examined
- Further classified as having focus on variable or objects
,Highlights of Chapter 2 – Self-study!
2.1. Conduct preliminary analysis: graphical inspection and simple analyses
Why?
- Get a feel for the data
- Suggest possible problems (and remedies) in next step
How?
- Univariate profiling
- Bivariate analysis
2.2. Detect outliers
How can we detect outliers?
- Univariate
- Bivariate
- Multivariate
2.3. Examining missing data
Missing data leads to:
- Reduced sample size (respondents can not be included in the sample)
- Possibly biased outcomes if missing data process not random
➔ 4 step approach for identification and remedying
1. Determine type of missing data → ignorable or non-ignorable missings?
2. Determine extent (%) of missing data → by variable, case, overall
3. Diagnose randomness of missing data → systematic, missing at random, missing completely
at random
4. Deal with the missing data problem → Remove Cases or variables with missing values, use
imputation (replace missing observations by an average)
, Step 3: Diagnose the randomness of missing data
Lecture 2: ANOVA
Step 1: Defining Objectives
Test whether the treatments (categorical variables) lead to different levels for a (set of) metric
outcome variables, for example:
- Does online ad design, in particular: position of picture and logo, affect the click-through rate
(DV)?
- How does visit frequency (1 or 2 a year) and use of samples (yes/no) affect physician
prescriptions (DV)?
- How does promo activity affect store sales and traffic (DV)?
➔ De DV is metric (interval/ratio scale) and the drivers for the input variables are non-metric.
They have to take on a discreet value (nominal/ordinal)
Overview of approaches
➔ Example: analysis of Store Sales and Traffic → How does promo activity affect store sales
and traffic? → 2 drivers: coupon activity (1 = 20euro/visit or 2= none) and promotion
intensity (1= high, 2=medium, 3=low)
➔ You see a picture of a data set, each row is a store (30 in total) and the different columns are
the different variables. Rating column = the wealth in region 1 to 10.
When we look at this set there are different questions: