WORKBOOK
SUMMARY
PSYCHOMETRICS 2017-18
Renée Lipka, IBP 17-18
,Topic 4: Principal component analysis (PCA)
1 the goal of PCA
− Main aim of PCA: data reduction (summarize total amount of information contained in
the data)
− E.g. getting sub-groups of questionnaire items that are relatively strongly correlated
with each other
> subgroups of observable variables have same underlying dimension
> components (factors in FA) = hypothetical constructs
− Often used in scale construction because item subgroups can then be combined into
scales which can be used as variables in other techniques
2 what PCA does
− Used for data reduction in multivariate situations (where many variables are varying
simultaneously)
− Technique for converting a large number of observes variables (p) into small
number of components (k)
− Aim is to find and name components that describe the structure of relationships as
accurately & concisely as possible
− Is a purely mathematical technique in which set of correlated variables is represented
by a set of uncorrelated components
− Can be described algebraically and geometrically
> Geometric: two correlated variables (X1 & X2) represented by 2 components
uncorrelated components (F1 & F2) can be rotated so F’s form the axis
o Ellipse = positive correlation
o Longest axis (F1) = most important direction because we can find
largest variance (dispersion) between points
o Next longest axis (F2) = next most important direction that is
completely independent/uncorrelated with first because it is orthogonal
to it
o Data points = respondents, can be interpreted in terms of original
scores on correlated variables (X1 & X2) or as scores on uncorrelated
components (F1 & F2)
2
, > Algebraic: conventionally show observed variables with squares and non-
observed (component) variables as circles
o Components are supposed to represent most important variance in 5
observed variables
o components (F) are computed as linear combinations (weighed sums)
of the observes variables (X)
o !"# = component loading, weight that relates variable Xi to component
Fj (i always refers to observed variable & j refers to component
involved)
o Thus: $% = !%% '% + !)% ') + !*% '* + !+% '+ + !,% ',
Thus: $) = !%) '% + !)) ') + !*) '* + !+) '+ + !,) ',
o the weight for F1 is chosen in a way that maximizes its correlation with
all X-variables
o weight for F2 is then in a way that it is also maximally correlated with
all X-variables but also that its completely uncorrelated (orthogonal)
with F1 -> the information summarized in F2 is thus completely new &
different from F1
o Now components (F) can themselves be seen as variables = every
person has a component score that is put together from observed
variables
3
SUMMARY
PSYCHOMETRICS 2017-18
Renée Lipka, IBP 17-18
,Topic 4: Principal component analysis (PCA)
1 the goal of PCA
− Main aim of PCA: data reduction (summarize total amount of information contained in
the data)
− E.g. getting sub-groups of questionnaire items that are relatively strongly correlated
with each other
> subgroups of observable variables have same underlying dimension
> components (factors in FA) = hypothetical constructs
− Often used in scale construction because item subgroups can then be combined into
scales which can be used as variables in other techniques
2 what PCA does
− Used for data reduction in multivariate situations (where many variables are varying
simultaneously)
− Technique for converting a large number of observes variables (p) into small
number of components (k)
− Aim is to find and name components that describe the structure of relationships as
accurately & concisely as possible
− Is a purely mathematical technique in which set of correlated variables is represented
by a set of uncorrelated components
− Can be described algebraically and geometrically
> Geometric: two correlated variables (X1 & X2) represented by 2 components
uncorrelated components (F1 & F2) can be rotated so F’s form the axis
o Ellipse = positive correlation
o Longest axis (F1) = most important direction because we can find
largest variance (dispersion) between points
o Next longest axis (F2) = next most important direction that is
completely independent/uncorrelated with first because it is orthogonal
to it
o Data points = respondents, can be interpreted in terms of original
scores on correlated variables (X1 & X2) or as scores on uncorrelated
components (F1 & F2)
2
, > Algebraic: conventionally show observed variables with squares and non-
observed (component) variables as circles
o Components are supposed to represent most important variance in 5
observed variables
o components (F) are computed as linear combinations (weighed sums)
of the observes variables (X)
o !"# = component loading, weight that relates variable Xi to component
Fj (i always refers to observed variable & j refers to component
involved)
o Thus: $% = !%% '% + !)% ') + !*% '* + !+% '+ + !,% ',
Thus: $) = !%) '% + !)) ') + !*) '* + !+) '+ + !,) ',
o the weight for F1 is chosen in a way that maximizes its correlation with
all X-variables
o weight for F2 is then in a way that it is also maximally correlated with
all X-variables but also that its completely uncorrelated (orthogonal)
with F1 -> the information summarized in F2 is thus completely new &
different from F1
o Now components (F) can themselves be seen as variables = every
person has a component score that is put together from observed
variables
3