Intro to Research in Marketing Spring 2022
Index
Introduction Lecture ................................................................................................. 2
ANOVA Lecture ....................................................................................................... 12
Linear Regression Lecture .................................................................................... 23
Factor Analysis Lecture ......................................................................................... 39
Cluster Analysis Lecture........................................................................................ 55
Logistic Regression Lecture ................................................................................. 67
Conjoint Analysis Lecture ..................................................................................... 77
IRM Wrap-Up Lecture ............................................................................................. 89
,Introduction Lecture
HBBA: Chapter 1
1.1 Defining multivariate analysis
HBBA: ‘Broadly speaking, it refers to all statistical methods that simultaneously analyze
multiple measurements on each individual or object under investigation’
Why bother?
→ Almost every real-life marketing problem requires statistical analysis of several variables:
you need them in your toolkit!
→ Crucial for Master Thesis:
• ‘Translate’ marketing problem
• Collect data
• Analyze using R
1.2. Some basic concepts
• Measurement scales
• Errors: reliability and validity
• Statistical significance and power
Measurement scales
Nonmetric scales: Nominal & Ordinal
Metric scales: Interval & Ratio
Nominal scale:
• Characteristics: unique
definition/identification, classification
• Phenomena: e.g., brand name, gender,
student ANR
• Appropriate methods of
analysis/statistics: e.g.: %, mode, chi-
square tests
Ordinal scale:
• Characteristics: indicate ‘order’,
sequence
• Phenomena: e.g., preference ranking,
level of education
• Appropriate methods of
analysis/statistics: percentiles, median,
rank correlation + all previous statistics
,Interval scale:
• Characteristics: arbitrary origin
• Phenomena: e.g., attribute scores, price
index
• Appropriate methods of analysis:
arithmetic average, range, standard
deviation, product-moment correlation, +
previous methods
Ratio scale:
• Characteristics: unique origin
• Phenomena: e.g., age, cost, number of
customers
• Appropriate methods of analysis: geometric
average, coefficient of variation, + all previous
methods
1.2.2. Errors: Reliability and Validity
Reliability = Is the measure ‘consistent’ correctly registered?
Validity = Does the measure capture the concept it is supposed the measure? (example =
income)
1.2.3 Statistical significance and power
Hypothesis testing
Suppose that the truth is: “No difference”:
what would error-free population measure, lead to? =
,Hypothesis testing
Suppose that the truth is: “No difference”:
what would sample measures, with error, lead to? =
Type I error (α) = probability of test showing statistical significance when it is not present
(‘false positive’)
Type II error (1-β) = probability of test showing statistical significance when it is present
Power
Power depends on:
• α (+)
• Effect size (+)
• Sample size (+)
Implications:
• Anticipate consequences of α, effect
and n
• Assess/incorporate power when interpreting results
1.3. Types of Multivariate methods
Dependence or Interdependence techniques
→ Dependence techniques
• One or more variables can be identified as dependent variables and the remaining as
independent variables
• Choice of dependence technique depends on the number of dependent variables
involved in analysis
→ Interdependence techniques
• Whole set of interdependent relationships is examined
• Further classified as having focus on variables or objects
,
, HBBA Chapter 2: Preliminary data analysis and data preparation (SELF STUDY)
2.1. Conduct preliminary analysis:
Why?
• Get a feel for the data
• Suggest possible problems (and remedies) in next steps
How?
• Univariate profiling
• Bivariate analysis
2.2. Detect outliers
What are outliers? → “Observations with a unique combination of characteristics identifiable
as distinctly different from other observations” (HBBA)
There are two basic types of outliers:
• ‘good’ = true value (probably)
• ‘bad’ = something is wrong?
→ To distinguish these types, one should investigate the causes:
− Procedural error
− Exceptional circumstances (cause known or unknown)
− ‘Regular’ levels, yet unique in combination with other variables (bivariate and
multivariate outliers)
Why worry? → Bad outliers completely mess up the results
How can we detect outliers?
• Univariate (Histograms, TS plots, Frequency Tables, Mean +/- 3SD, Box Plots)
• Bivariate (Scatterplot, Multiple Histograms)
• Multivariate (Mahalanobis D2)
Keep or delete? → “Judgement Call”
• Only observations that truly deviate can be considered outliers
• Removing many ‘outliers’ can jeopardize representativeness
2.3. Examining missing data
Missing data lead to:
• Reduced sample size
• Possibly biased outcomes if missing data process not random → 4-step approach: for
identification and remedying
Steps in missing data analysis:
1. Determine type of missing data: Ignorable / Non-ignorable missings?
2. Determine extent (%) of missing data: By variable, case, overall
3. Diagnose randomness of missing data: Systematic, Missing at Random (MAR),
Missing Completely At Random (MCAR)?
4. Deal with the missing data problem: Remove cases or variables with missing values,
use imputation
→ Step 3: Diagnose randomness of missing data
Are non-ignorable missings:
• Systematic = linked to level variable itself, another pattern?
• Missing at Random (MAR) = whether Y is missing depends on level of X. Yet, within
level of X: missing at random