Introduction to Research in Marketing:
Chapter 1:
Multivariate Analysis: all statistical methods that simultaneously analyze multiple
measurements on each individual or object under investigation
® We are going to study a number of individuals/objects and for each we have multiple
measurements on each individual or object under investigation
Basic concepts:
1. Measurement scales:
• The way different variables are measured
• Four types of measurement scales
o Non-metric scales: nominal vs ordinal
o Metric scales: interval vs ratio
Example shampoos:
• Nominal: types of shampoo
• Ordinal: preferred shampoo (no 1, 2, 3)
• Interval: score shampoo (rate brand between 0 and 100)
• Ratio: price of brand
Nominal scale:
® Characteristics: unique definition/identification
• Phenomena: eg. Brand name, gender, student ANR
• Appropriate methods of analysis/statistics: e.g.. %, mode (most frequently in data), Chi
Square tests (relationship between nominal values)
• Important because it will be determine how to use the data
Ordinal Scale:
® Characteristics: indicate ‘order’, sequence (are more informative compared to nominal
because of the sequence)
• Phenomena: E.g., preference ranking, level of education
• Appropriate methods of analysis/statistics: percentiles, median, rank correlation +
previous methods
Interval scale:
® Characteristics: arbitrary origin
• Phenomena: e.g., attribute scores, price index
• Appropriate methods: arithmetic average, range, standard deviation, product-moment
correlation (Pearson) + previous methods
Ratio scale:
® Characteristics: unique origin
• Phenomena: e.g., age, costs, number of customers
• Appropriate methods: geometric average, coefficient of variation + all previous methods
Difference Interval and ratio: for ratio it’s clear what the number stands for
2. Errors: Reliability and validity
• Reliability: is the measure consistent, correctly registered?
• Validity: does the measure capture the concept it is supposed to measure?
,3. Statistical significance and power:
Problem:
1. Errors in our measures
2. We will only look at a sample of observation
Hypotheses testing:
With H0 you assume that nothing is happening, in H1 is the expectation (H0: not guilty, H1:
guilty)
Two types of mistakes:
1. Type one error: if in reality there is no difference
but the outcome of your test tells you there is a
difference
2. Type two error: if there was a difference in reality
but your tests tell not so
Type I error (a) = probability of test showing statistical
significance when it is not present (‘false positive’)
Power (1-b) = probability of test showing statistical
significance when it is present
Hypothesis testing:
® Suppose that the truth is: no differences. What would error-free
population measure, lead to? à
® Suppose that the truth is: no difference. What would sample
measures, with error, lead to? à
Cut off: only conclude there is a difference if the number is bigger
than the cutoff
• If you move your cut off to the right, the risk is going to be
lower
• If you move your cut off too much, you will reduce type 1
error but you will increase the type 2 error
Power (how alpha affects the power):
• Power depends on:
• a (+) à if alpha is bigger, power is bigger
• Effect size (+) à if effect size is bigger, power is bigger
• Sample Size n (+) à if sample size is bigger, power is
bigger
• Implications:
• Anticipate consequences of a, effect and sample size
• Assess/incorporate power when interpreting results
Dependence or interdependence techniques:
• Dependence techniques:
• When there is one measure which is the result variable and we are interested in
other (causal) variables that drive this specified outcome variable
• One or more variables can be identified as dependent variables and the
remaining as independent variables
• Choice of dependence technique depends on the number of dependent variables
involved in analysis
, • Interdependence techniques:
• Whole set of interdependent relationships is examined
• Further classified as having focus on variable or objects
Chapter 2: Preliminary Data analysis and data preparation
SELF STUDY
• Get a feel for data and problems
Outliers: Observations with a unique combination of characteristics identifiable as distinctly
different from the other observations
® Two basic types:
1. Good: true value (probably) à they tell you something
2. Bad: something is wrong in data
® Reasons for errors:
• Procedural error
• Exceptional circumstances (cause known or unknown)
• Regular levels, yet unique in combination with other variables (bivariate and
multivariate outliers)
® Why worry? Bad outliers completely mess up the results!
® We can detect outliers by univariate, bivariate (and multivariate)
Judgement call: keep or delete outliers
• Only observations that truly deviate can be considered outliers
• Removing many ‘outliers can jeopardize representativeness
Examining missing data:
® Example: survey of 400 people, only 250 people answered a specific question
® Missing data leads to:
1. Reduced sample size
2. Possibly lead to biased outcomes if missing data process is not random
Four step approach for identification and remedying:
1. Determine type of missing data: ignorable vs non-ignorable missings?
2. Determine Extent (%) of missing data: by variable, case, overall
3. Diagnose randomness of missing data:
a. Systematic: whether there is a value missing or not is linked to level of variable
itself, other pattern?
b. Missing At Random (MAR): whether Y is missing depends on level of X, Yet,
within level of X: missing at random
c. Missing Completely at random (MCAR): whether Y is missing is truly random
(independent of Y or any other X variable)
4. Deal with the missing data problem: remove cases or variables with missing values, use
imputation
Example: shopper income and time pressure
® People with more time pressure are likely to not fill in income because it is the last question
à there is a pattern (54.6%)
ANOVA
• A dependent measure
• Dependent variables: metric variables
• Independent variables: nominal/non-metric variables
, Step by step:
Step 1: Defining objectives
® Test whether treatments (categorical variables) lead to different levels for a (set of) metric
outcome variables
® Examples:
o Does online ad design, in particular: position of picture and logo, affect the click-
through rate?
o How does visit frequency (once or twice a year) and use of samples (yes\no) affect
physicians’ prescriptions?
o How does promo activity affect store sales and traffic?
1-way: 1 factor
N-way: multiple factors
Why do we use ANOVA instead of multiple T-tests?
• 1 test, a=.05:
• Probability of decision=‘effect’ while there is none: .05
• Probability of decision=‘no effect’ while there is none: .95
• 3 tests, a=.05 in each test:
• Probability of decision=‘no effect’ in each test, while there is none: (.95)3=.857
• Probability of decision=‘effect’ in at least one of the three tests, while there is
none: (1-.857)=.143>.05!!!
• Probability of erroneously finding effect increases with number of tests
Step 2: Designing the AN(C)OVA
• Sample size (>20/cel) à minimum of 20 observations per population (can be multiple
cells)
• Treatments and interactions à As soon as you have more than 1 treatment variables. Do
you expect they to affect the outcome independently or do they interact?
• Do you want to use Covariates?
• Continuous
• Pre-measure
• Independent of treatment
• Limited number à <(.1)*#obs-(#populaties-1)
• Dependent variable
Step 3: checking assumptions
Three questions:
• Are the observations independent? (Repeated-measures ANOVA)
• Are variances equal across treatment groups (homoscedasticity)?
• Is the dependent variable normally distributed?
Testing for Interaction:
• Does the impact of a change in one treatment variable on the dependent variable,
depend on the level of the other treatment variable?