Notizen

Lecture Notes Topic 1: PCA, CATPCA and CFA, AMDA Fall

43 mal angesehen 0 mal verkauft

Kurs
Applied Multivariate Data Analysis (6465G06FY)

Hochschule
Universiteit Leiden (UL)

Lecture notes of Topic 1: PCA, CATPCA and CFA. The lecture notes are combined lecture notes of study year and year , so very comprehensive!

[ Mehr anzeigen ]

vorschau 3 aus 25 Seiten

Zum Beispiel

Hochgeladen auf 11. oktober 2022
Anzahl der Seiten 25
geschrieben in 2022/2023
Typ Notizen
Professor(en) Ralph rippe
Enthält Topic 1, college 1 t/m 3

statistiek
amda
statistics
pca
biplots
leiden university
leiden
rippe
psychology
research master
cfa
r
child and education studies

Hochschule
Universiteit Leiden (UL)
Studium
Research Master Psychology
Kurs
Applied Multivariate Data Analysis (6465G06FY)

Folgen

Charlottevtn

Mitglied seit 6 Jahren 98 dokumente verkauft

4,49 €

In den Einkaufswagen

Zur Wunschliste hinzufügen

100% Zufriedenheitsgarantie
Sofort verfügbar nach Zahlung
Sowohl online als auch als PDF
Du bist an nichts gebunden

Lecture notes topic 1 AMDA fall 2022 – PCA, CATPCA and CFA

Lecture 1 – Principal Component Analysis
Several distinctions in statistics:
- Descriptive vs Testing
- Exploration vs Confirmation
- Dependence vs Interdependence

With PCA, we are in the exploratory, interdependence situation.

PCA is widely applied:
- if you have 4 single variables that measure SES in some way (income, educational level e.g.),
can’t you just use one (weighted) average? → how are you going to do this? Is educational
level more important than income for example? And if so, how much more important? That
is something PCA can tell us. PCA gives you weights (c weights / components loadings) of the
‘ingredients’ / variables. Why is this useful? → Here we summarize 4 variables into one
component, and this component becomes one new variable, which you can use in turn for
your regression analysis for example. So you are combining PCA with regression here. This is
called dimension reduction: going from 4 to 1 dimensions.
- How many subconcepts of intelligence can we distinguish? → You can also do this reduction
the other way around: in my 24 items, are there subdomains?
- What chemicals possess similar properties under heat / pressure / …?
- Quantify ethnic spread in (sub)populations

PCA in chemistry → which chemicals perform in a similar way for example?
The picture on the right is called a biplot. It is the output of a component
analysis. Which variables are correlated? Those variables are closely together,
and (can) form one component. It is a visual representation of the dimensions
reduction → from variables to 2 dimensions

PCA for ethnicity → can we, based on
genetics, find subgroups of individuals with
similar genetic make-up? You have 10000 or
more variables for genetics. If you measure them all, you need to summarize them: going down from
a lot of genetic variables to something that you can use in a regression model.

The picture on the right: people close to each other are closer in genetic make-
up. You can see that people in the same area, are also more closely related
genetically. So the components make geographically sense: people from the
north of the Netherlands are genetically different from the people from the
south of the Netherlands. This is a very practical example, and computation of
this takes a lot of time. But this is an example of what you can do with PCA.

A dataset example (1): Dimension reduction
- Suppose we have a large collection (say 100) of variables (dimensions)
- Of which several (25) measure the same concepts
- Then working with 100 is too much (why?), but how do you summarize these? Or:
- How do your reduce the dimensionality of the items? → here you are not finding the number
of components, you just want to summarize the variables in 1 score. This way, PCA tells you
how important the different variables are.
- This is exploration!

,A dataset example (2): Scale construction
- You have an idea on how to design a questionnaire for some concept
- You design questions that address several subconcepts
- But are the supposed items actually adequate in their subdomains?
- Which items form sub-scales of your instrument,
- And how reliable are these sub-scales?

E.g. Intelligence:
- 1 concept? (General Intelligence)
- 2 concepts? (Verbal and Performance?
- 3 concepts? (Verbal, Performance, Freedom from distractibility)

Extra note!: It is principal component analysis, not principal components! → Because, there is only 1
principle component (but there can be more components). We use PCA when we have no idea about
what is going on in our data: there is no p-testing, no confidence intervals etc. We only explore, and
use it for visualization. PCA can give us simple but informative plots about the data!

Components versus Factors
Two main approaches:
Principal Component Analysis →
- We use this technique to get an idea about any underlying structure → no testing!
- No pre-assumed structure
- Exploratory method
- Visualization

Factor Analysis →
- To confirm or reject a suspected factor structure → testing!
- Structure derived from previous research or theory
- Confirmatory method → hypothesis testing

Similarities:
- Both methods deal with groups of variables.

Factor analysis: you assume knowledge of items in scales. It is based on previous research or from
theory. With factor analysis, you make a precise model for relationships between items and scales.
Based on fit, do results on new data match those from other work? Is the model true for you
(current) sample? This is not performed in SPSS, but specialized programs for Structural Equation
Modelling (SEM) exist (EQS, LISREL and others, like LAVAAN in R). You go from theory/model to data
here. So, factor analysis also contains groups of variables, but in this case there is an initial idea of
what this groups of variables might be: predefined structure. We will take about this next week.

PCA is the other way around: it is theory generating, and you evaluate relationships in the data
(Pearson, 1901). But, without an a priori structure! We explore data for a structure of principal
components (PCA) → e.g. 20 variables, we have a correlation matrix of 20 x 20. The technique
searches for a structure of components: it finds groups of variables that show high correlations with
each other, and low correlations with others. External (theoretical) knowledge is used afterwards for
interpretation. Analyses are performed in SPSS (Factor) or R. So you go from data to model/theory!
For example: we have a set a variables, but this set has no distinction in terms of a predictor vs
outcome set. We are interested in how these variables inter-relate: correlations.

, You could first do PCA and play around, and next, based on the outcomes of your PCA, do a factor
analysis. In this case, you are not exploring but you are testing your first results! So you can use the
output of PCA as input for factor analysis.

The principle of Principle Components
Basic scale construction → You want to create a sum-score. However, you cannot just
add all the scores, you need the weights of the scores. Because it is possible that this
set of 8 items is not one homogeneous set.

Scale construction using crisp weights in, example where you have 8 items (4 items
called A, and 4 items called B):
CA = c1A1 + c2A2 + c3A3 + c4A4 + c5B1 + c6B2 + c7B3 + c8B4

- Variable either in or out of a scale
- Weights c either 1 (in) or 0 (out)
- Variables determine scale interpretation
- Equal or no contribution to construct

What PCA does: look at data structure, decompose the whole thing, and suggest numbers to put on
the c’s (component loadings), depending on how much they weight. PCA finds out how many
subscales there might be, and which items should best be included in each subscale. So you can add
all the items to form one subscale, but you have to take into account how much they weight to the
total.

In this example: the 8 items (A1 to B4) are already in two groups: scale A and scale B. You could do it
the following: CA = 1A1 + 1A2 + 1A3 + 1A4 + 0B1 + 0B2 + 0B3 + 0B4. This would be the ‘easy straight
forward’ edition. But, we want to go away from ‘in’ or ‘out’ (0 and 1), we want to go here:

Advanced scale construction → Scale construction using actual weights in: CA = c1A1 +
c2A2 + c3A3 + c4A4 + c5B1 + c6B2 + c7B3 + c8B4
- More subtle inclusion (or exclusion)
- Weighed contribution to component
- c is anything between 0 and 1
- Some variables are more important, others are less important

Values c < 0.30: exclusion
Values c > 0.30: inclusion?
Values c > 0.50: inclusion (if highest)
Values c > 0.80: for clinical instruments

CA = 0.8A1 + 0.7A2 + 0.9A3 + 0.9A4 + 0.0B1 + 0.2B2 + 0.3B3 + 0.1B4 → what this means, is that item b1
to b4, belongs a little bit to first group, but not a lot. So probably they are only slightly correlated to
the 8 items. If the weight of b1 to b4 would be exactly 0, this would mean that they are completely
uncorrelated (the 4 items would in this case have nothing to do with the 8 items). But this is hardly
ever true! Again: the weights are suggested by PCA, and are mathematically based. In this example
you can see that items A3 and A4 are more important (c = 0.9) to the subscale than item A2 (c = 0.7).

So again: PCA tells you how many subgroups there might be, and which items go into which groups
best, and which contribution coefficient the items have. This is all exploration.

Some details on PCA and components:
- Component is a weighted linear combination of items: A = c1A1 + c2A2 + c3A3 + c4A4 + .. + ..

Alle Vorteile der Zusammenfassungen von Stuvia auf einen Blick:

Garantiert gute Qualität durch Reviews

Stuvia Verkäufer haben mehr als 700.000 Zusammenfassungen beurteilt. Deshalb weißt du dass du das beste Dokument kaufst.

Schnell und einfach kaufen

Man bezahlt schnell und einfach mit iDeal, Kreditkarte oder Stuvia-Kredit für die Zusammenfassungen. Man braucht keine Mitgliedschaft.

Konzentration auf den Kern der Sache

Deine Mitstudenten schreiben die Zusammenfassungen. Deshalb enthalten die Zusammenfassungen immer aktuelle, zuverlässige und up-to-date Informationen. Damit kommst du schnell zum Kern der Sache.

Häufig gestellte Fragen

Was bekomme ich, wenn ich dieses Dokument kaufe?

Du erhältst eine PDF-Datei, die sofort nach dem Kauf verfügbar ist. Das gekaufte Dokument ist jederzeit, überall und unbegrenzt über dein Profil zugänglich.

Zufriedenheitsgarantie: Wie funktioniert das?

Unsere Zufriedenheitsgarantie sorgt dafür, dass du immer eine Lernunterlage findest, die zu dir passt. Du füllst ein Formular aus und unser Kundendienstteam kümmert sich um den Rest.

Wem kaufe ich diese Zusammenfassung ab?

Stuvia ist ein Marktplatz, du kaufst dieses Dokument also nicht von uns, sondern vom Verkäufer Charlottevtn. Stuvia erleichtert die Zahlung an den Verkäufer.

Werde ich an ein Abonnement gebunden sein?

Nein, du kaufst diese Zusammenfassung nur für 4,49 €. Du bist nach deinem Kauf an nichts gebunden.

Kann man Stuvia trauen?

4.6 Sterne auf Google & Trustpilot (+1000 reviews)

45.681 Zusammenfassungen wurden in den letzten 30 Tagen verkauft

Gegründet 2010, seit 15 Jahren die erste Adresse für Zusammenfassungen

Starte mit dem Verkauf

Populäre Bücher

Beliebte Hochschulen und Universitäten

Beliebte Hochschulen

Verkäufer