Samenvatting

Summary Applied Multivariate Data Analysis EUR LECTURES + LITERATURE: all I studied to get 8.3/10 on exam

32 keer verkocht

Instelling
Erasmus Universiteit Rotterdam (EUR)

These are all of my notes from the book and the lectures for the 4.4C exam. I got an 8.3 without taking in any other information. There is some repetition in this summary, because a lot of the stuff that's in the lectures is of course the same as what's in the book. If we covered a topic twice, the...

[Meer zien]

Voorbeeld 4 van de 41 pagina's

Bekijk voorbeeld

Heel boek samengevat? Nee
Wat is er van het boek samengevat? Relevant chapters for course
Geupload op 21 februari 2021
Aantal pagina's 41
Geschreven in 2020/2021
Type Samenvatting

statistics
psychology statistics
statistics psychology
statistiek
psychologie
psychology master
forensic and legal psychology
applied multivariate data analysis

Volgen

caitirotterdam Lid sinds 4 jaar 101 documenten verkocht

€7,99

Ook beschikbaar in voordeelbundel v.a. €20,49

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Ook beschikbaar in voordeelbundel (1)

Forensic & Legal Psyc EUR 4 SUMMARIES (12.50 euro discount)

€ 32,46 € 20,49

1x verkocht

4 items

1. Samenvatting - Intro to forensic and legal psyc full literature summary: what i studied to get a 9.5...
2. Samenvatting - 2020 eur etiology of offender types and forensic neuroscience all lectures and litera...
3. Samenvatting - Forensic diagnostics and treatment all lectures + readings: summary for exam i used t...
4. Samenvatting - Applied multivariate data analysis eur lectures + literature: all i studied to get 8....
Meer zien

Applied Multivariate Data Analysis Lectures and Readings
Week One

Lecture One
Statistical Models
We fit models to our data, and use models to represent what happens in the real world.
Models have parameters, which are estimated from the data and represent relations
between the variables in the model, and variables, which are measured constructs - e.g.
anxiety - and vary across individuals in the sample. We compute model parameters in the
sample to estimate the true value in the population.
The normal distribution has 2 parameters: the mean and the sd.
A line also has 2 parameters: slope and intercept.
The mean in our sample is a ''model'' for the true effect of, for example, CBT (variable 1) on
anxiety (variable 2). We could write this as:

Anxiety improvement

b would be:

If we put this into a formula/model, we would just say the effect of the therapy is the mean
plus error term. We assume that the manipulation has an equal effect across people + some
random effect that causes difference, e.g. distraction, differences in history etc.
Model Fit

So the mean is the model of the real world typical score. To assess how well the mean fits
the true mean, we test the model ‘’fit’’. A perfect fit would be every individual showing the
mean of the group. Random distribution around the mean is non-perfect fit. How do we
quantify the degree of fit? we calculate the average error. We can use squared errors, sd,
mean squared errors to do this.

= sample mean = value from which the (squared) scores deviate least (least error)
SS = sum of squared errors. If we sum all the squared deviations we have the SS (sum of
squared errors). This depends on n. The more observations you have, the higher the SS will
be, so then we need the Mean Squared Error (MSE, the average of SS) so that it's fair. The
larger this value is, the less accurate the model is representing your data.
We get MSE, the average, by dividing SS by df (n-1). We lose one df here because we want
the model fit to the population, not the sample, and we lose 1 df because we're using the
sample m to estimate population m.

,The larger the MSE, the worse the fit. When the model we are looking at is the mean, MSE is
called variance. If we take the root of the mean squared error we get the sd, which tells us
on the same scale as the mean what the average error is.
Variance = s2 sd = s
Central Limit Theorem
Depending on n and variability, the estimate of the population that the sample provides will
be more or less precise. When we take many samples from a population, we make the
sampling distribution, which we use to make inferences about the population with a
sampling distribution. This is the distribution of values we'd get if we'd repeat our sample
randomly and record all the means - it's the mean of means. This way, we can see if our
mean/sample was typical or atypical. These samples must all have the same n.
So we can compute variability of sample means. The sd of this distribution is the Standard
Error (se) of the mean. The smaller the sample size and the larger the variability of the trait
in the population, the larger the standard error.

the se = the sd / the square root of n =
We use se for many statistical tests. we use it, for example, to create a confidence interval.
Confidence Intervals
In other words, boundaries within which we believe the population mean will lie.
95% CI: for 95% of all possible samples, the population mean will be within its limits.
We calculate a 95% CI by assuming that the t distribution is representative of the sampling
distribution. This looks like a normal distribution, but with fatter tails depending on the
degrees of freedom.

A 95% CI corresponds to an alpha of .05. This is the most commonly used one in psychology,
but can also be for example 90% with alpha of .1 or 99% with alpha of .01. So basically, if we
want a 95% chance of our sample mean ''catching'' the population mean, it has to be within
2 sd of the mean.
Then we need a critical value for above (upper limit) and below (lower limit) the mean. To
get this critical value, we'll use se and a t distribution table of t values to find the right t
value. we use a t distribution because we are using a sample! t is for samples. So, we need
2.5% at each of the tails.
So, using a 95% CI 100 times, 95 of them will catch the actual population mean. Thus, a 99%
CI would have ''wider arms'' and we'd be more sure that the CI catches the actual population
mean.

,We can interpret a CI by saying, "our confidence interval is a range of plausible values for the
population mean, values outside of it are relatively implausible." Or, if our mean is 8 with a
lower limit of 6 and an upper limit of 10, "the margin of error is 2: we can be 95% confident
that our point estimate is no more than 2 points from the true population mean." The
smaller the margin of error, the more precise our estimate is.
We transform our sample result in to a test statistic. It is a statistic for which the frequency
of particular values is known (e.g. t, F, chi squared).
If our sample outcome is very unlikely given H0, so p < 0.05, we reject H0. This low p value
means our test statistic is ''significant'' and ''unlikely''. Different hypotheses use different test
statistics - e.g., when we use one or two means we use a t test, and multiple means an F test.
Even though in this case we would reject H0 because it's quite unlikely we find our result if it
were true, a significant effect does not equal an important effect.
If we're using a CI to test H0 (which is probably that mean = 0), we use a one-sample t-test
when looking at one mean and an independent-samples t-test when looking at the
difference between two means.
Effect Size
This is a standardized measure of the size of an effect. It quantifies importance. A
standardized effect size is comparable across studies. It is not very reliant on sample size and
allows objective evaluation of the size of the observed effect.
There are many kinds, for example we use Cohen's d when we look at differences between or
within groups, Pearson's r when we look at continuous variables/correlations, partial eta
squared when we have multiple variables and looks a lot like r.
So: r is for correlations, d is for groups, eta is for multiple variables
We compare effect sizes like small, medium, and large, and this depends on the context of
the field.
r of .1 or d of .2 = small effect, accounts for 1% of variance
r of .3 or d of .5 = medium effect, accounts for 9% of variance
r of .5 or d of .8 = large effect, accounts for 25% of variance
Whilst effect sizes tell us about the size of our result, p-values don't tell us that much about
our result. They just tell us the chance of observing the effect. So we have to report more
than just a p value! We usually look at m, CI, test for null hypothesis so t value, and effect
size as well, as well as p value. This gives us the necessary info to interpret the effect.
One more thing we look at when testing hypotheses is power. Power is the ability of a test to
find an effect that is there, so the probability the test will find an existing effect. It is the
direct opposite of beta, or Type II error, which is the probability that an existing effect in the
population will be missed. So, power is 1 - beta. Generally, power of .8 is good, so 80%
chance of detection.

Lecture One Q&A

, Covariate = independent variable in the model, parameter = what you estimate
If a confidence interval contains 0, it is not significant.

Lecture One Readings
Simmons, Nelson & Simonsohn (2011): False-Positive Psychology
This article says that despite the nominal endorsement of a .05 alpha in terms of low Type 1
error, flexibility in data collection, analysis and reporting increases Type 1. Often researchers
are more likely to find and report a false positive than a true negative.
False positives are harmful because they are so persistent - we can't just ''disprove'' them by
replicating a study and finding no effect. They also waste resources as follow-ups are
conducted to expand on discoveries that are actually false.
This study says it is too easy to find a statistically significant effect. This is due to researcher
degrees of freedom: researchers have to make lots of decisions, like how much data to
collect, what should be excluded or compared, what controls should be used and so on.
Often researchers make these rules as they go along and prefer decisions that lead to
significant findings, whether these are true or false. This is due to the ambiguity of making
these decisions and desire to discover something. It has been proven that there is great
inconsistency, for example, in how researchers treat outliers.
4 common researcher degrees of freedom: choosing among dependent variables, choosing
sample size, using covariates, reporting subsets of experimental conditions. By using all 4, a
61% false positive rate was found.
To solve this, researchers should:
1. decide the rule for terminating data collection before beginning data collection and
report it in their study
2. collect at least 20 observations per cell or justify it if not
3. list all collected variables in study
4. report all, including failed, experimental conditions
5. if observations are eliminated, report what the statistical results are if they were to
be included
6. if analysis includes a covariate, report statistical results of the analysis if it was
without it

Field Chapter 2
We can't add deviances from the mean up to look at the model fit, because all the negative
and positive ones would add up to 0. that's why we used squared errors. But this also isn't
super handy because it will increase with n, so then we average it with n-1 to get mse. when
looking at means, mse = variance.

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper caitirotterdam. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 65309 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Begin nu gratis

Samenvatting

Summary Applied Multivariate Data Analysis EUR LECTURES + LITERATURE: all I studied to get 8.3/10 on exam

Document informatie

Onderwerpen

Gekoppeld boek

Meer samenvattingen voor studieboek

Geschreven voor

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud