Technique I – Explorative Factor Analysis
Clip 1 – Introduction
Purpose: Estimate a model which explains variance/covariance between a set of
observed variables (in a population) by a set of (fewer) unobserved factors &
weightings.
Observed variables: which means you have collected data/observed variables.
What you would like to do is you want to understand the variance/covariance
between
the set of observed variables, so how do they relate.
You are interested in how the items are related to each other (how they
together measure a latent variable).
With that you want to understand unobserved factors or other dimensions play a role
in this dataset of unobserved variables.
Example:
They were interested in how you perceive your fairness of grading and how you as
students are. This was plugged in a dataset.
Important: we have rows: each respondent is a row.
In columns: all the observations from the different items.
Now you’re interested in how these 6 items are related to each other.
Gra 1, 2 and 3 are together fair grading
Sat 1, 2 and 3 are together Satisfaction
What is Factor Analysis?
- Interdependence technique : you’re really interested in how these different
items interrelate with each other (you’re not interested in predicting already).
- Define structure among variables: so you want to define structure among
these observed variables in the dataset and find out how they relate to each
other.
- Interrelationships among large number of variables to identify underlying
dimensions. These underlying dimensions are called the factors.
- You do this mainly for two purposes:
o 1. Data summarization (to understand the higher order dimensions).
o 2. Data reduction (reduce the data observations in order to use them in
other analysis).
Factors: are the underlying dimensions of a larger number of variables.
Recap: Measurement model (college 1)
Underlying items are X1, X2, X3, to measure the latent construct.
Measurement error: the systematically biases that really influence how we measure
these items.
,With factor analysis you can assess this type of measurement.
Multi-item measurement – why do we do this type of measurement at all?
- Increased reliability and validity of measures
- Allows measurement assessment
o Measurement error
o Reliability
o Validity
- Two forms of measurement models:
o Formative (emerging – we have more items and they together emerge
as the construct).
o Reflective (latent – these are the typical ones we see in strategy and
marketing research: latent means there’s a construct and these items
really reflect this construct).
Remember: in most of the research they use reflective measurement and for these
reflective measurements we use Factor Analysis.
And we want to assess reliability and validity.
Some examples to explain what this means:
- Eerste dartboard: Veel zwarte stipjes (data points in our simple): they are
reliable, but not valid they are nicely together (which means reliable) but
not on target (which means not valid)
- Valid, not reliable: black dots are more or less on target (valid), but they are
really spread out (not reliable)
- Neither valid nor reliable : widely spread and not on target.
You want to achieve to both valid and reliable: points on target and clustering
together.
Reflecting measurement models:
- Direction of causality is from the construct to the measure.
- They usually are correlated indicators, which means that the items with each
other correlate and these correlations together are used in the factor analysis
to explain the dimensions.
- It takes measurement error into account at the item level
- Validity of items is usually tested with Factor analysis.
X1 = independent variable.
Landa 1 = factor loading
Times
Xsee = construct
+ measurement error
Applications:
- Assess the validity of construct measurements your thesis is definitely an
application field. If you use quantitative data, you definitely have to use factor
analysis.
- Market segmentation/product research/price management etc.
- Basically, it is anything where you would like to assess higher-order
dimensions.
,
,Clip 2 – Conducting a Factor Analysis
All our methods follow a similar process – from problem formulation to model fit (so
not only for FA, but for all the methods).
Analysis Process
1. Problem formulation
2. Constructing Correlation Matrix of the data we have collected
3. Selecting the Extraction method (clip 3 uitgebreid uitgelegd)
4. Determine the numbers of factors we have
5. Rotating factors and out of these rotations:
6. Interpreting the factors (you want to understand what these analysis told you
actually, in terms of your problem)
7. Using Factors in other Analysis – we will see if we can use the factors, in other
analysis. Can we summarize certain data into factors and use it into for
example the regression model
8. Determining Model Fit
1) Problem formulation:
The objectives of factor analysis should be identified:
o Data summarization? Or
o Data reduction
Which variables. Criteria:
o Based on past research, theory, and judgment of the researcher. We
are going to select particular variables.
o Measurement properties need to ratio or interval. These are the metric
scales: important because FA is a metric measurement and therefore
we need these ratio and interval measurement properties.
o Sample size (4-5 * N per variable): if it’s too low we don’t have enough
power and the factor analysis won’t word. How big does our sample
needs to be to conduct a factor analysis.
Rule of thumb: We need 4-5 items per number of respondents
per variable for an adequate sample size.
Conducting a factor analysis: it’s very important to distinguish between:
Exploratory Factor Analyses
Confirmatory Factor Analyses
These are the 2 major types.
Exploratory Factor Analyses:
Is about exploration of the data: find an underlying structure. (we have
collected a lot of items, but we don’t know a lot about the subject let, that’s
why we want to find the underlying structure of the higher order dimensions).
That means that there is an assumptions that superior factors cause
correlations between variables, but we do not have insights yet on what these
superior factors could be.
Used to reveal interrelationships: so we don’t know the relationships yet, but
FA in an exploratory way will reveal these relationships.
Therefore, the main purpose of Exploratory FA is generation of hypotheses
,Confirmatory Factor Analysis:
We already have a priori ideas of underlying factors, usually derived from
theory (which one have interrelationships).
That also means the relationships between variables and factors are kind of
assumed before conducting the FA. You have expectations about that.
Therefore, confirmatory FA is used for testing of hypotheses.
Example:
Imagine you want to conduct a research among consumers and their perceptions on
toothpaste.
How would you do that?
1. Collect data. You have respondents and 6 variables.
a. These variables could be questions you ask your respondents.
You are interested in:
1. See how they are related to each other
2. How they provide you with higher order dimensions
Step 1: Conducting a correlation matrix
1. You construct the correlation matrix between these items Because FA is
an analytical process that is based on a matrix of correlations between the
variables.
See slide for example what a correlation matrix could look like: 6 variables on the
vertical axes and the 6 variables on the horizontal axes and you can see what the
intercorrelation of each item with each item is.
Step 2: Test if we can use Factor Analysis
We also want to know whether this correlation matrix helps is in determining whether
we can use FA at all, because that’s the basis. For that we have useful statistics:
1. Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy. It tells you
whether your sample adequately represents the population.
a. Test of de sample de populatie representeert = dit wil je = KMO >0,5
2. Bartlett’s test of sphericity: it tests the null hypothesis that the variables are
uncorrelated in the population. If the null hypothesis needs to be accepted, it
means you have no correlations in the population and that means you
wouldn’t be able to do a FA. You want to reject the null hypothesis here and
you want to be sure that there are enough correlations in the population.
a. Test dat er geen correlatie is = wil je niet H0 verwerpen = significant
,Example:
We see that the KMO is ,660 and we also see that Bartlett’s test is significant.
Rules of thumb:
KMO should be at least above .5 the closer to 1, the better!
Bartlett’s sig. level should be smaller than .05* = significant (only then it is
given that we have enough correlation in our sample to be able to conduct a
FA).
*0.5 is the typical alfa value of testing
,Clip 3 – Selecting an Extraction Method
Is an important step within FA.
Usually we have two major types of extraction methods.
Distinguish between:
Principal Components Analysis
Common Factor Analysis
Principal Components Analysis:
Looks at total variance in the data
Diagonal of the correlation matrix consist of unities: within the correlation
matrix out of the previous clip, we look at the diagonals and these are
considered as unities.
Full variance is brought into the factor matrix.
Primary concern for principal components analysis is that we want to have a
minimum number of factors that will account for maximum variance. So
principal components analysis always tries to maximize explained variance
The factors are called principal components
Mathematically, each variable is expresses as a linear combination of the
components.
The covariation among the variables is described in terms of a small number
of principal components.
So if the variables are standardized, the principal component model may be
represented as: xi = Ai1c1 + ….. (zie slide) , ook voor meanings.
Common Factor Analysis:
Factors are estimated based only on the common variance.
Communalities are inserted in the diagonal of the correlation matrix.
Primary concern: identify the underlying dimensions and their common
variance (of these underlying dimensions)
Which means the factors are also known as principal axis factoring
Mathematically, each variable is expresses as a linear combination of the
underlying factors.
The covariation among the variables is described in terms of a small number
of common factors plus a unique factor for each variable.
The model contains V and U:
o V = standardized regression coefficient of variable I on unique factor l
o U: the unique factor for variable l
Important difference:
The diagonal value of the correlation matrix can be distinguished into unity and
communality. In terms of variance:
Unity looks at the total variance.
(principal component)
In the communality we have a part of
common variance and unique
variance. (Common Factor)
Principal Component model
, We see that all of communalities have value of 1. Initial and extraction, both
one, which means total variance has been considered for all 6 variables.
If we then look at variance explained, we see that the first factor really has a
pretty high value of explaining variance, because remember: the principal
component model tries to maximize variance.
Look at communalities to see how much of the total variance has been considered.
Look at total variance to see how much variance is explained, the higher the number
the more variance that particular factor explained.
Common Factor model
o Looks a bit different. We see initially they don’t have a value of 1 and
also after extraction they don’t have a value of 1.
o Also in variance explained we see that the first factor also has a nicely
large proportion of explained variance, but it’s not as high as in the
principal component analysis, even though we used the same data.
Extraction result: Factor matrix