Lecture 1 – Factor Analysis
Algemene informatie over de factor analyse:
Factoranalyse is een statistische analysetechniek om een aantal aspecten
die te reduceren tot een beperkt aantal of om na te gaan of de aspecten
laden op een onderliggende dimensie.
Stel je hebt tien vragen over de tevredenheid over de woning waar men
woont gesteld. Je vermoedt dat deze tien vragen terug te voeren zijn tot
één totaalscore. Dat is ook beter, want anders moet je alle analyses met
tien losse variabelen uitvoeren. Als deze tien items terug te brengen zijn
tot één variabele, hoef je maar één variabele in de analyse op te nemen.
Dus stop je deze tien vragen in de factoranalyse en krijg je de uitkomst
zoals in de tabel hiernaast.
Oeps, uit deze analyse blijken er twee factoren uit te komen! De items 1
tot en met 6 vormen samen één factor en de items 7 tot en met 10 de
andere. Met tien items is het echter niet onwaarschijnlijk dat je twee, drie
of zelfs vier factoren vindt. Het aantal factoren dat je krijgt, is afhankelijk
van de samenhang tussen de items en de grootte van de eigenwaarden.
Dat zijn aspecten die je kunt beïnvloeden of instellen.
What is the issue?
Some questions a marketing manager may face
-How do our customers evaluate our products
-How satisfied are our customers
How to obtain these insights?
- Customers often use surveys
-Answer in certain scale
-These scales can take different forms
One item scale is to broad, too complicated to use for marketing concepts.
So, we use multi-item scales > for attitudes and lifestyles f.e.
Example for satisfaction: ‘Can you indicate how satisfied you are with the
performance of firm. To measure overall satisfaction, add sub-questions
(underlying items). Manifest questions (sub-vragen), to derive the latent
,(overall outcome). De latentvragen zijn niet letterlijk gevraagd maar
worden gevormd door de antwoorden van de manifest questions.
Core questions a marketing manager than faces:
- How can we reduce multi-item scale responses from a survey
into a manageable number of variables?
- Some of the variables will be correlated to each other. Separate
variables, with overlap. They measure basically the same.
Do all the questions can be combined to measure a latent variable? >
Look at the factor loading (needs to be above 0.6?)
Developing multi-item scales
1. Develop theory
2. Generate relevant (set of items)
3. Collect pre-test data
4. Analyze
5. Purify
6. Collect test data
7. Evaluate reliability, validity, and generalizability
8. Final scale
Step 4 to 8 = what we use for factor & reliability analysis
Multi-item scales often have too many items for further analysis, so, data
reduction.
-Factor analysis
-Reliability analysis
Data analysis in 2 stages
Stage 1: inspection and preparing data for final analysis
-Inspection of data (which variables, scales, get a feeling for the
data)
-Cleaning your data (oddities, missing/wrong values, outliers)
-Combining items into new dimensions > Factor and reliability
Stage 2: Final analysis, testing your hypotheses
-Regression analysis using the new dimensions instead of original
items
Crap in, Crap out (if the data is crap, the output will be crap)
Purpose of factor analysis: reduction of a large quantity of data by
finding common variance to
-Retrieve underlying dimensions in your dataset, or
-Test if the hypothesized dimensions also exist in your dataset
Two central questions
-How to reduce a larger set of variables into a smaller set of
uncorrelated factors?
(Unknown number and structure, Hypothesized number and
structure)
, -How to interpret these factors? (= underlying dimensions), and
scores on these factors?
Ultimate Goal
-Use dimensions in further analysis
-Position brands on these dimensions
Data
-Several interval or ratio scaled variables (often ordinal, but
assumed interval, Likert).
Note
-No distinction is made between dependent (Y) and independent (X)
variables
-FA is usually applied to your independent variables (X).
Data reduction
Strong correlations between two or more items
- Same underlying phenomenon
- So combine, to get
- Parsimony (the most efficient number of variables in the
analysis)
- Less multicollinearity in subsequent analysis
First check: correlation matrix
Basis concept
SPSS invisibly standardizes X-variables (mean = 0, sd = 1)
Steps for FA using SPSS
1. Research purpose
- Which variables? How many? Sample size?
2. Is FA appropriate?
- KMO measuring of sampling adequacy
Sampling adequacy predicts if data are likely to factor well,
based on correlation and partial correlation.
If KMO <0.5, drop variable with lowest individual KMO statistics.
- Bartlett’s test of sphericity
H0: variables are uncorrelated, i.e. the identity matrix
If we cannot reject H0, no correlations can be established
- NOTE: also check the communalities
Common rule: >.4 (40% of the variance to be captured by the
factors we’ll work with)
You want a high Chi-square! Because you want a difference.
Communalities
The communalities measure the percent of variance in a given
variable explained by all the extracted factors
This is <1, since we have fewer factors than variables
, 3. Select factor model to get the weights (Wij)
- See how the variables combine into factors
- Two options:
4. PCA (principal component analysis)
5. Common factor analysis (analyses only common variance)
- Researchers don’t agree which is ‘best’, but PCA is most popular.
Use PCA for this course.
First: always use the default option for eigenvalues greater than 1,
afterwards you can tell SPSS the number of factors.
Eigenvalue = how worthwhile this component is
4. Select best number of factors
- May not be clear up front
- Several criteria
-Total opgeteld = 9 totaal eigenvalues
-Component 1, zorgt voor 43% of de variance van alle 9 componenten.
-Cut off point = eigenvalue should be larger than 1. Because the original
value is already 1.
-Cumulative cut off point = 60%
-Explain 5% factors variance