Applied Multivariate Data Analysis Lectures and Readings
Week One
Lecture One
Statistical Models
We fit models to our data, and use models to represent what happens in the real world.
Models have parameters, which are estimated from the data and represent relations
between the variables in the model, and variables, which are measured constructs - e.g.
anxiety - and vary across individuals in the sample. We compute model parameters in the
sample to estimate the true value in the population.
The normal distribution has 2 parameters: the mean and the sd.
A line also has 2 parameters: slope and intercept.
The mean in our sample is a ''model'' for the true effect of, for example, CBT (variable 1) on
anxiety (variable 2). We could write this as:
Anxiety improvement
b would be:
If we put this into a formula/model, we would just say the effect of the therapy is the mean
plus error term. We assume that the manipulation has an equal effect across people + some
random effect that causes difference, e.g. distraction, differences in history etc.
Model Fit
So the mean is the model of the real world typical score. To assess how well the mean fits
the true mean, we test the model ‘’fit’’. A perfect fit would be every individual showing the
mean of the group. Random distribution around the mean is non-perfect fit. How do we
quantify the degree of fit? we calculate the average error. We can use squared errors, sd,
mean squared errors to do this.
= sample mean = value from which the (squared) scores deviate least (least error)
SS = sum of squared errors. If we sum all the squared deviations we have the SS (sum of
squared errors). This depends on n. The more observations you have, the higher the SS will
be, so then we need the Mean Squared Error (MSE, the average of SS) so that it's fair. The
larger this value is, the less accurate the model is representing your data.
We get MSE, the average, by dividing SS by df (n-1). We lose one df here because we want
the model fit to the population, not the sample, and we lose 1 df because we're using the
sample m to estimate population m.
,The larger the MSE, the worse the fit. When the model we are looking at is the mean, MSE is
called variance. If we take the root of the mean squared error we get the sd, which tells us
on the same scale as the mean what the average error is.
Variance = s2 sd = s
Central Limit Theorem
Depending on n and variability, the estimate of the population that the sample provides will
be more or less precise. When we take many samples from a population, we make the
sampling distribution, which we use to make inferences about the population with a
sampling distribution. This is the distribution of values we'd get if we'd repeat our sample
randomly and record all the means - it's the mean of means. This way, we can see if our
mean/sample was typical or atypical. These samples must all have the same n.
So we can compute variability of sample means. The sd of this distribution is the Standard
Error (se) of the mean. The smaller the sample size and the larger the variability of the trait
in the population, the larger the standard error.
the se = the sd / the square root of n =
We use se for many statistical tests. we use it, for example, to create a confidence interval.
Confidence Intervals
In other words, boundaries within which we believe the population mean will lie.
95% CI: for 95% of all possible samples, the population mean will be within its limits.
We calculate a 95% CI by assuming that the t distribution is representative of the sampling
distribution. This looks like a normal distribution, but with fatter tails depending on the
degrees of freedom.
A 95% CI corresponds to an alpha of .05. This is the most commonly used one in psychology,
but can also be for example 90% with alpha of .1 or 99% with alpha of .01. So basically, if we
want a 95% chance of our sample mean ''catching'' the population mean, it has to be within
2 sd of the mean.
Then we need a critical value for above (upper limit) and below (lower limit) the mean. To
get this critical value, we'll use se and a t distribution table of t values to find the right t
value. we use a t distribution because we are using a sample! t is for samples. So, we need
2.5% at each of the tails.
So, using a 95% CI 100 times, 95 of them will catch the actual population mean. Thus, a 99%
CI would have ''wider arms'' and we'd be more sure that the CI catches the actual population
mean.
,We can interpret a CI by saying, "our confidence interval is a range of plausible values for the
population mean, values outside of it are relatively implausible." Or, if our mean is 8 with a
lower limit of 6 and an upper limit of 10, "the margin of error is 2: we can be 95% confident
that our point estimate is no more than 2 points from the true population mean." The
smaller the margin of error, the more precise our estimate is.
We transform our sample result in to a test statistic. It is a statistic for which the frequency
of particular values is known (e.g. t, F, chi squared).
If our sample outcome is very unlikely given H0, so p < 0.05, we reject H0. This low p value
means our test statistic is ''significant'' and ''unlikely''. Different hypotheses use different test
statistics - e.g., when we use one or two means we use a t test, and multiple means an F test.
Even though in this case we would reject H0 because it's quite unlikely we find our result if it
were true, a significant effect does not equal an important effect.
If we're using a CI to test H0 (which is probably that mean = 0), we use a one-sample t-test
when looking at one mean and an independent-samples t-test when looking at the
difference between two means.
Effect Size
This is a standardized measure of the size of an effect. It quantifies importance. A
standardized effect size is comparable across studies. It is not very reliant on sample size and
allows objective evaluation of the size of the observed effect.
There are many kinds, for example we use Cohen's d when we look at differences between or
within groups, Pearson's r when we look at continuous variables/correlations, partial eta
squared when we have multiple variables and looks a lot like r.
So: r is for correlations, d is for groups, eta is for multiple variables
We compare effect sizes like small, medium, and large, and this depends on the context of
the field.
r of .1 or d of .2 = small effect, accounts for 1% of variance
r of .3 or d of .5 = medium effect, accounts for 9% of variance
r of .5 or d of .8 = large effect, accounts for 25% of variance
Whilst effect sizes tell us about the size of our result, p-values don't tell us that much about
our result. They just tell us the chance of observing the effect. So we have to report more
than just a p value! We usually look at m, CI, test for null hypothesis so t value, and effect
size as well, as well as p value. This gives us the necessary info to interpret the effect.
One more thing we look at when testing hypotheses is power. Power is the ability of a test to
find an effect that is there, so the probability the test will find an existing effect. It is the
direct opposite of beta, or Type II error, which is the probability that an existing effect in the
population will be missed. So, power is 1 - beta. Generally, power of .8 is good, so 80%
chance of detection.
Lecture One Q&A
, Covariate = independent variable in the model, parameter = what you estimate
If a confidence interval contains 0, it is not significant.
Lecture One Readings
Simmons, Nelson & Simonsohn (2011): False-Positive Psychology
This article says that despite the nominal endorsement of a .05 alpha in terms of low Type 1
error, flexibility in data collection, analysis and reporting increases Type 1. Often researchers
are more likely to find and report a false positive than a true negative.
False positives are harmful because they are so persistent - we can't just ''disprove'' them by
replicating a study and finding no effect. They also waste resources as follow-ups are
conducted to expand on discoveries that are actually false.
This study says it is too easy to find a statistically significant effect. This is due to researcher
degrees of freedom: researchers have to make lots of decisions, like how much data to
collect, what should be excluded or compared, what controls should be used and so on.
Often researchers make these rules as they go along and prefer decisions that lead to
significant findings, whether these are true or false. This is due to the ambiguity of making
these decisions and desire to discover something. It has been proven that there is great
inconsistency, for example, in how researchers treat outliers.
4 common researcher degrees of freedom: choosing among dependent variables, choosing
sample size, using covariates, reporting subsets of experimental conditions. By using all 4, a
61% false positive rate was found.
To solve this, researchers should:
1. decide the rule for terminating data collection before beginning data collection and
report it in their study
2. collect at least 20 observations per cell or justify it if not
3. list all collected variables in study
4. report all, including failed, experimental conditions
5. if observations are eliminated, report what the statistical results are if they were to
be included
6. if analysis includes a covariate, report statistical results of the analysis if it was
without it
Field Chapter 2
We can't add deviances from the mean up to look at the model fit, because all the negative
and positive ones would add up to 0. that's why we used squared errors. But this also isn't
super handy because it will increase with n, so then we average it with n-1 to get mse. when
looking at means, mse = variance.