Zusammenfassung

Summary lectures Applied Multivariate Data Analysis (Fall) (6462PS009Y)

43 mal angesehen 1 mal verkauft

Kurs
Applied Multivariate Data Analysis (6462PS009Y)

Hochschule
Universiteit Leiden (UL)

Summary of all the lectures regarding the four topics for the AMDA fall exam.

[ Mehr anzeigen ]

Letzte Aktualisierung vom Dokument: 1 Jahr vor

vorschau 4 aus 46 Seiten

Zum Beispiel

Hochgeladen auf 20. januar 2023
Datei zuletzt aktualisiert am 23. januar 2023
Anzahl der Seiten 46
geschrieben in 2022/2023
Typ Zusammenfassung

amda
statistics
pca
logistic regression
clustering
meta analysis

Hochschule
Universiteit Leiden (UL)
Studium
Psychologie
Kurs
Applied Multivariate Data Analysis (6462PS009Y)

Folgen

steffaniebrama

Mitglied seit 4 Jahren 27 dokumente verkauft

12,99 €

In den Einkaufswagen

Zur Wunschliste hinzufügen

100% Zufriedenheitsgarantie
Sofort verfügbar nach Zahlung
Sowohl online als auch als PDF
Du bist an nichts gebunden

Exam Amda

Topic 1 - PCA & CFA

Learning Goals

1) Understand how PCA and CFA are (un)related in terms of both solution structure and
model specification.

PCA is widely applied —> Working with many variables that measure the same concept.
- You have an idea on how to design a questionnaire for some “concept”. —> you design
questions that address several subconcepts —> items suppose actually adequate in their
subdomain?

—> important to reduce the dimensionality and scale construction; for example measuring
intelligence, 1 concept? 2 concepts? or 3 concepts?

Two main approaches
1) Principal component analysis (only one component that is the only principal one).
- Use this technique to get an idea of any underlying structure —> Set of variables, on which you
could compute a correlation matrix. But this set has no distinction in terms of predictor and
output. Interested in how these variables interrelate/correlation
- Exploring (not testing!) (no sign. test)
- No pre-assumed structure, all 20 variables could be either predictors or outcomes.

2) Factor analysis
- Contains groups of variables, but already has an initial idea of what these groups of variables
might be.
- Predefined structures and these structures will be tested/evaluated in factor analysis. This
predefining might come from previous research or your own theory.
- Confirmatory method
- Makes a precise model for the relationship between items and scales —> Model true for your
current sample?
-
So in short;
Both methods deal with groups of variables. PCA tries to identify potential groups of variables
based on the correlation structure. In CFA, the idea of variable grouping and testing whether this
grouping structure works for our data.

Confirmatory Factor Analysis
- Test a specific factor structure —> predefined structure, grouping items/variables. No particular
predictor or outcome in this analysis.
- Trying to come up with fit measures to tell us how well our predefined structures work for the
data that we have.

- How do we test this?

,In linear regression —> explained variables (to check the fit) (residuals - predicts scores). The
closer the predicted scores are to our outcomes, the better the model works. (Significant regression
coefficients).

In CFA —> matrix situation, set of variables, construct covariance matrix (correlation matrix -
standardized matrix). —> observed matrix because we can compute the covariances between the
things that we have observed and we define a factor model and use this to predict a covariance
matrix.
So we compare the observed covariance matrix with the predicted covariance matrix. —> residuals
in a matrix shape.

> Interdependence technique —> so no predictor vs outcome (CFA)
Some form of a predictor structure going on.
> Dependence technique —> set of predictors and assume that some score in y depends on the
predictor variables (Linear regression)

Confirm our theoretical construct division.

Technical aim:
—> reproduce correlation / predict covariance matrix.
—> Error —> misfit between observed and predicted matrix. Errors are not correlated.
—> Correlations based on the observed numbers should be explained by common factors (sound
like regression).
—> Regression equation with manifest response variables with two latent predictors. Assume that
there is some underlying but not directly observable process going on (F1 and F2). —> but this
does lead to differences in scores in the variables that we do observe. (Variable X1-X6).

So there is single linear regression —> F1 —> X2, Factor 1 —> X2 etc.
We can predict scores for the items and predict a covariance matrix. Assume something going on we
cannot see and that something leads to the scores that we observe in the variables themselves.
Factors can be correlated or not. Can also be that items are explained by more than 1 factor.
(Crossloadings). —> if many it could be that you are ignoring the correlations between the factors.

Compared to component analysis;
For each component, there is an arrow to all of the items. Some items will have close to zero
numbers and some will be very high.

CFA is more strict than PCA because the item will have to be exactly a loading of zero.

Factors; theoretical constructs that we examine —> can be that our CFA is derived from PCA.
Components; empirically suggested combinations of variables —> may or may have not meant. In
CFA you already assume that this structure has meaning.

PCA —> if you have a set of variables you have no idea what will happen in terms of structure.
EFA —> instruments that have never been tested before. Some ideas and some items may be
correlated in some way, so there is a theory, but you are not testing this theory.
PCA —> One single strong conceptual idea of the factor structure.

,Example;

4 correlated factors and 15 items. Assume —> that each of these items corresponds to one factor
only. —> number of covariances; 0.5 x number of items x number of items + 1 = 120 covariances in
this case; 0.5 x 15 x 16. —> Units of observations (fit evaluation).

What elements are estimated in the model;
15 unique variances.
4-factor variances.
Correlations between the factors —> so 6 covariances between the factors; formula: 5 x number of
factors x number of factors - 1.
11 loadings (unique factor loadings) —> difference between 15 items and 4 factors. For each of
these factors, one of the arrows needs a fixed factor loading, all of the other factor loadings will be
relevant to that number. Thus, 4 are fixed and 11 values remain to be estimated.

Counting everything together —> 11 + 15 + 4 + 10 = 36 model parts that are going to be estimated.
(Number of parameters estimated in the model). We have 120 covariances, so 120-36 = 84 degrees
of freedom.

Check in the output;
- Warning and errors
- Standardized residuals
- Residual distribution
- Model fit statistics
- Estimated parameter
- Suggestion for improvement.

Assumptions
- Like performing CFA on metric/numerical variables. Scale/interval variables.
- The sample needs to have more observations than variables.
- For stable covariances we need 100 observations —> but CFA wants more, at least 200.
- Minimum 5 items, but preferably 10 observations per variable.
- A strong conceptual idea of what you are going to test —> hypothesized model!

Rule of thumbs

Look at the X2 statistics
CFI —> confirmatory fit >.95 (how well does your model fit).
sRMR —> <.08,
RMSEA —> <.06 (90% confidence interval)
—> Also apply equivalence tests.
No need for rotation, we don’t need to identify the best possible view of subgroups. There is one
specific subgroup defined by ourselves.
You can have variables that have a high coefficient on one factor only. Not persé a problem, as long
as you can defend it.

, Model specification;
- 4 factors = 4 latent variables
—> if 13 variables —> 91 covariances, so 91 numbers in this dataset.
Residual distribution
—> Symmetric distribution, on average 0, equal amounts of over and under estimations.

Interpreting model fit statistics.

Chisq + df + p value = fit statistics.
Baseline. Chisq —> the difference between the model that you have currently estimated and a
model without any factor structure. The larger the X2, the larger the difference. —> difference
between the covariance matrix based on your model, compared to a model without any factor
structure. You want to have specified a model that is better than no model specification —> Should
be significant!

The other chi-square —> is the difference between the observed and predicted model, covariances,
using your data and your model —> you do not want to be significant! Should be alike! A large
difference means that your prediction does not resemble the observation matrix. —> significantly
deviating from our observations.

- CFI should be >.95

We want to have a small standardized root mean residuals —> sRMR <.08
Rmsea —> <.06, if 0 —> perfect match between prediction and observation.

You can have a very reasonable model but is not very strongly fitting yet, based on the fit statistics
and X2.

Suggestions for improvement

Maybe being strict too strict by not allowing some of the factors to predict two of these items, for
example, factor 2 should also be allowed to have items from factor 1 in his model? —> this may
lead to an extra add of variance leading to a perfect/sufficient fit.

Request modification index that suggests where you might want to add things to your model to
improve your fit. For example; factor 3 should add vocabulary also in their model, leading to a
2.494 decrease in misfit.

2) Understand how PCA goes from data to component structure.

Rely on interrelationships between variables. Technique searches for a structure of components
by finding groups of variables that show high correlations within the group, but the lower
correlation between the groups.

Interpretation comes afterward, as an exploratory technique. Possible that the structures do not
make sense —> Probably due to weak correlations.

So —> Going from data (correlation matrix) to potentially suggested models that support a theory

Alle Vorteile der Zusammenfassungen von Stuvia auf einen Blick:

Garantiert gute Qualität durch Reviews

Stuvia Verkäufer haben mehr als 700.000 Zusammenfassungen beurteilt. Deshalb weißt du dass du das beste Dokument kaufst.

Schnell und einfach kaufen

Man bezahlt schnell und einfach mit iDeal, Kreditkarte oder Stuvia-Kredit für die Zusammenfassungen. Man braucht keine Mitgliedschaft.

Konzentration auf den Kern der Sache

Deine Mitstudenten schreiben die Zusammenfassungen. Deshalb enthalten die Zusammenfassungen immer aktuelle, zuverlässige und up-to-date Informationen. Damit kommst du schnell zum Kern der Sache.

Häufig gestellte Fragen

Was bekomme ich, wenn ich dieses Dokument kaufe?

Du erhältst eine PDF-Datei, die sofort nach dem Kauf verfügbar ist. Das gekaufte Dokument ist jederzeit, überall und unbegrenzt über dein Profil zugänglich.

Zufriedenheitsgarantie: Wie funktioniert das?

Unsere Zufriedenheitsgarantie sorgt dafür, dass du immer eine Lernunterlage findest, die zu dir passt. Du füllst ein Formular aus und unser Kundendienstteam kümmert sich um den Rest.

Wem kaufe ich diese Zusammenfassung ab?

Stuvia ist ein Marktplatz, du kaufst dieses Dokument also nicht von uns, sondern vom Verkäufer steffaniebrama. Stuvia erleichtert die Zahlung an den Verkäufer.

Werde ich an ein Abonnement gebunden sein?

Nein, du kaufst diese Zusammenfassung nur für 12,99 €. Du bist nach deinem Kauf an nichts gebunden.

Kann man Stuvia trauen?

4.6 Sterne auf Google & Trustpilot (+1000 reviews)

45.681 Zusammenfassungen wurden in den letzten 30 Tagen verkauft

Gegründet 2010, seit 15 Jahren die erste Adresse für Zusammenfassungen

Starte mit dem Verkauf

Populäre Bücher

Beliebte Hochschulen und Universitäten

Beliebte Hochschulen

Verkäufer