Marketing Research Methods
Lecture 1 Factor Analysis
Questions marketing manager may face: How do our customers evaluate our
products/services how obtain these insights: use surveys.
Often, one-item scales:
Do you like the taste of this brand?
How old are you?
But, marketing concepts are more often too complicated for one-item scales use
multiple-item scales. But, multi-item scales often have too many items for further
analysis data reduction (factor analysis, reliability analysis).
Scales
Example multi-item scales = impulse buying tendency (9 questions about buying).
Developing scales (not knowing for exam)
1. Develop theory
2. Generate relevant set of items
3. Collect pre-test data
4. Analyse
5. Purify
6. Collect test data
7. Evaluate reliability, validity, generalizability
8. Final scale
Data analysis often in 2 stages
Stage 1: Inspection and preparing data for final analysis
- Inspection of date (items)
Which variables/measurement scales/coding scheme
Cleaning your dataset (missing values, outliers…)
- Combining items into new dimensions
Stage 2: Final analysis, testing your hypotheses
Factor analysis
Purpose: Reduction of a large quantity of data by finding common variance to
- Retrieve underlying dimensions in your dataset (you assume there are
underlying dimensions, but now idea how many or what dimensions you
know nothing), or
- Test if the hypothesized dimensions also exist in your dataset (you are going to
check whether a certain number of Hypothesized dimensions are truly there
want to check if something what you think is there is truly there)
Two central questions
- How to reduce a large set of variables into a smaller set of uncorrelated factors?
Unknown number and structure
Hypothesized number of structure
, - How to interpret these factors (= underlying dimensions), and scores on these
factors?
Ultimate goal: use dimensions in further analysis (e.g. position brands on these
dimensions, see where there are still gaps in the market).
Can use data for factor analysis, which are interval- or ratio scaled variables. Often
ordinal, but assumed interval (Likert).
Note: no distinction is made between dependent and independent variables! FA is
usually applied to your independent variables. No causal relation between variables.
Data reduction
- Metrical data on n items
- Summarize the items into p < n ‘factors’
Why can we reduce the number of variables: if multiple things look the same, why not
including a combination instead of including all of them separately?
Strong correlations between two or more items
- Same underlying phenomenon
- So combine, to get parsimony (= explain a lot by little) / less multicollinearity in
subsequent analysis
Combine into “factors” or “principal components”.
First check: correlation matrix
Doing FA in SPSS SPSS will automatically standardize your x variables (mean = 0 , st =
1).
Basic concept
Suppose we have X1, X2, X3, X4, X5, X6. Any set of variables X1…X6 can be expressed as a
linear combination of other variables, called factors F1, F2, F3, F4, F5, F6 based on
common variance in X1…X6. You choose only the ‘strongest factors’ (best number of
factors unknown upfront, e.g. 2 factors F1 & F2). Will never be done perfectly, info gets
lost.
,All variables lead more or less on all factors. Looking for high-loading items, interested
in the high loadings to label the dimensions (loading > .5). Dimensions (factors)
retrieved are uncorrelated with each other.
Items = variables = survey questions
Dimensions = factors = components
Implementing FA in SPSS
Steps for FA using SPSS
1. Research purpose
Which variables? How many? Sample size?
2. Is FA appropriate?
KMO measure of sampling adequacy
-Sampling adequacy predicts if data are likely to factor well, based on
correlation and partial correlation
-If KMO <.05, drop variable with lowest individual KMO statistics & do it
again if not good drop again
Bartlett’s test of sphericity
-H0: variables are uncorrelated, i.e. identity matrix
-If we cannot reject H0, no correlations can be established
NOTE: also check the communalities
-Common rule: >.4
Communalities
The communalities measure the percent of variance in a given variable explained by all
the extracted (different) factors. This is <1, since we have fewer factors than variables.
For example, how much variance in X1 is explained by F1 and F2. You have fewer factors
than you have variables, e.g. had 6 variables and kept 2 factors some information is
lost because leave out 4 factors. So can never fully explain all the variance in an X
variable, because part of that variance is explained by F3-F6.
3. Select factor model to get the weights Wij
See how the variables combine into factors
Two options:
-PCA (Principal Component Analysis) used most often
-Common factor analysis q
4. Select best number of factors
May not be clear up front!
Several criteria
Better one factor too many, than one factor too few, because than your are squeezing
stuff together which do not belong together.
You can ignore the components of lesser significance, but focus on components that
give you the highest explanatory contribution and discard the other ones.
,Number of factors
Criteria for factor selection:
- Only those for which eigenvalues (green) > 1
(Eigenvalue = how much variance a factor
explains/how much variance is extracted by that
component). Each variable has a variance than
one, you want an eigenvalue which is larger than
one (explains/has to account for more than just
the x variable)
- Total explained variance (red) > 60%
- Those factors that explain (blue) > 5% each
(especially when you have many factors)
- Inspect scree plot (graph of eigenvalues)
- Common sense (interpretability)
- Sometimes need to take average between criteria
- Better too many than too few (too few: variables are forced into factor)
(Default criterium in SPSS: factors eigenvalue >1)
Scree plot = a plot of number of factors (or: components) on X-axis and the eigenvalues
on the Y-axis. Plots the eigenvalues of different components. Looking for the elbow
interested in the upper arm (in this case at retain 4).
5. Rotated Factor Matrix in SPSS (usually rotate orthogonally, e.g. VARIMAX)
Unrotated Factor Matrix: hard to interpret
Rotation
-Prevents that all variables load on 1 factor
-Minimizes the number of variables which have high loadings on each given
factor
Does not change the variance explained
6. Interpreting labelling factors (including communalities)
Rotated loadings > 0.5
Interpreting factors
Which values to interpret?
- Rotated loadings (orthogonal/oblique)
Rotation prevents that all variables load on 1 factors
Rotation doesn’t change the variance explained
- High value means factor draws a lot from the variable
(marker item)
- Loadings > 0.5
- Check (high) cross loadings (redo FA?) high on both
factors (might throw-out that variable) probably bad question if it links to two
factors
, - Give the dimensions names (labelling) based on marker items
More factors higher communality.
7. Subsequent use of factors
Create new variables
Use the obtained factors as new variables
- Use factors instead of individual items
Two ways to do this
1. Calculate the factor scores for each respondent
2. Use reliability analysis
Lecture 2 Factor Analysis, Reliability Analysis, Cluster Analysis
Key terms in factor analysis
- Factor
Underlying dimension
Combination of variables
- Loading (of rotated component matrix)
Correlation of variable (X) with factor (F)
- Eigenvalue of factor (F)
How much variance a factors explains (sum
of squared loadings within a factor)
- Communality of variable (X)
How much variance all factors explain for a
variable (sum of squared loadings of a
variable)
- Rotation
Factors should be rotated to facilitate interpretation
- Factor score
Respondent’s score on a factor which is a linear combination of all original
variables based on the loadings and scores on the items
For each new factor, a new entry in SPSS appears, one per factor
,Always useful to look at the correlations/scatter plots of the different variables.
Reliability analysis (using Cronbach’s Alpha)
Factor analysis vs. reliability analysis
Factor analysis: how to reduce a large(r) set of variables into a smaller set of
uncorrelated, on beforehand unknown, factors or dimensions or to test a theoretically
assumed factor structure in a set of items (“does the factor solution in my data comply
with the assumed/hypothesized factor structure?”). Most important that before hand
unknown what the structure looks like.
Reliability analysis: when underlying dimensions are known, e.g. after using factor
analysis or because you use a scale from previous research (e.g., the well-known scale to
measure impulse buying tendency). Want to judge if dimensions are strong enough/how
reliable is your factor. So either you have done factor analysis, came up with factors and
want to know if these are good factors or used establishes scale and test if that scale
works in your setting. So see if scale is strong enough in your research or whether the
factors you found are strong enough when you only look at market items. Market items
are for example 2 factors with both 3 variables high loading 3 high loading variables
on each factor are the market items do not do reliability analysis for all the variables
of the factor, but only the market items.
Cronbach’s alpha to measure internal consistency = strength to proceed with these
dimensions instead of the original items.
How reliable is the factor/scale?
When to use reliability analysis
2 main uses:
1. Reliability analysis of scale as found in theory
“Is the theoretical scale also ‘strong enough’ in my research?”
2. Reliability analysis of factors found in
PCA
“Is the factor found ‘strong enough’ if we
only look at the marker items?”
Validity and reliability of scales
,Want that random error is 0, than your scale is reliable if measures multiple times get
the same outcome. But also want systematic error 0 (validity) both errors are zero
than you have a good scale. Reliable: measuring multiple times gets you the same
outcome. Valid: what you are measuring is truly what you want to measure.
Subsequent use of factors
- Factor ‘strong’ enough? Internal consistency of marker items
- Use Cronbach’s alpha to inspect this (higher than 0.6)
- Make new variables as combination of the marker items under that factor (e.g.
summation divided by number of variables). Take average score of each
respondent on variable 1,2,3 is score on the first factors and average on
variable 4,5,6, score on the second factor.
- For instance: to use subsequently in a regression/to position the brands on the
factors (‘brand mapping’)
Internal
consistency
, They don’t factor well together not 0.6 have a look at “if items deleted” if leaving
out one of the first two Cronbach’s alpha even worse. Leaving out the last one will
increase the internal consistency. The first two really belong together, last one is linked
less than the first two, somewhat different do it and recalculate.
If all of these variables (marker items) would be equally linked to each other and the
factor values should be about the same (leaving one out will not change much) and
about the same Cronbach’s alpha. The higher the differences with “Cronbach’s Alpha if
item deleted” the worse your factor.
Which items in a factor/construct
What makes a question “Good” or “Bad” in items of alpha?
- SPSS and SAS will report “alpha if item deleted”, which shows how alpha would
change if that one question was not on the test.
If low: the question is good because deleting that question would lower
the overall alpha.
If high: low inter-item correlations
- In a well-validate test, no question will have a large deviation from the overall
alpha.
If a question is “bad”, this means it is not conforming with the rest of the test to measure
the same basic factor. The question is not “internally consistent” with the rest of the test.
If two items & “Cronbach’s alpha if item deleted” if leave another one out ofcourse it
is internally consistent with itself, does not make sense anymore.
Example: 6 variables 2 factors. Save factor scores in SPSS get two new factor scores
& for each respondent you get a score on factor 1 and a score on factor 2. Or you
calculate for each of the households for each of the brands the average score on the first
3 variables and the average score on the last 3 variables. Get different values. Because in
that sense if you do factor scores provided by SPSS what you get is a score basically
taking into account all x variables, but with the first 3 having a stronger weight on the
first factor and the later 3 strong weight on the second factor and less impact of first 3
variables but they still have an impact. Not all variables have the same weight on the
factor, can be that variable 1 has a stronger weight than variable 3. This is what you get
if you do the factor scores. But if you calculate the average score itself across the marker
items first the other 3 variables are 0 (no marker items in this variable) and you say
that the weights of the first 3 variables is exactly the same. Outcomes from both methods
somewhat different. With first one closes to exactly data.