Applied Multivariate Data Analysis –
Spring
Please do not share this with your fellow students. I have put a lot of work into this and hope to
have put a low/fair price on it. SO that people who need it can afford it.
Would be greatly appreciated. Thanks you!!
Now good luck studying!
,0. Planning for reproducible research
● The aim as researchers:
● In the last years it has been found that research was not reproducible. = reproducible
crisis
○ This is not specific to psychology, but also in other sciences.
● It seems that researchers are exploiting grey zones. Even unintentional uses of those grey
zones can significantly change the outcome of a study.
○ Usually the guideline that is used to say there is a significant result is .05. This
means that in 5% of the cases we are still wrong. By using the grey zones (such as
having more dependent variables, adding observations, dropping conditions, and
controlling for variables) can significantly increase this rate (up to 60%, instead of 5)
○ Research has found that these practices are being used quite frequently
But what can we do to solve this problem?
- The most agreed upon techniques are pre-registration of study plans, reproducibility
practices, and standardizing definitions and analysis.
- Others are sharing data, better statistical methods, improved study design standards, peer
reviewing, etc.
Robustness checks
1. Checking the written report (e.g. are numbers correct?)
2. Reproducibility (does re-analysis of the same data give the same outcome?)
3. Bias present? (e.g. sensitivity to researcher decisions: multiverse analysis, p-curve; was
bias constrained by preregistration?)
4. Replication (independent lab repeats methods)
→ together make science self-correcting
Making your work more robust:
● Planning the analysis
● Reproducible practices
Planning the analysis
● You need to think through your thesis and make a plan that includes multiple things of how
you will perform the analysis. This prevents the possibility to make decisions later on that
can (even unconsciously) be biased and cause your results to be skewed to what you want
it to be.
, ○ So.. create a data analysis plan, even before you collect data: including
■ The research question;
■ The hypotheses (null and alternative);
■ The population;
■ The sampling plan
■ The sample size (e.g. power analysis; stopping rules);
■ The statistical analysis;
■ The assumptions of the statistical analysis;
■ What to do if the assumptions are not true
■ How to detect and handle outliers/ influential cases / missing data
■ The possible outcomes of the analysis
■ What to do for each of the possible outcomes
● Benefits of planning:
○ Everyone on the team is on the same page before the start. This way you can also
make sure that when people are not working at the same time or on the same task,
that everyone still knows what to do, etc.
○ Design/analysis is well thought through and biased decisions can be prevented
○ Protect yourself again biases (e.g. confirmation bias)
○ Clear divide between confirmatory and exploratory
■ You can still run analysis that you did not plan to do from the beginning. You
will just have to put them under exploratory analysis. This creates a clear
distinction between what is actually confirmatory and what is exploratory.
○ ‘Relief’ if you run the planned analysis knowing you don’t have to tweak it anymore
○ List goes on (especially when registering at a journal, ‘RRs’)
■ For example, they will more likely be willing to also publish non-sig. Findings.
Because you have registered it beforehand and then they already made
some form of contact with you and your work and also they can “approve “ it
beforehand, again binding them more to the publication of the final results
Generally: the more specific the plan, the better the plan. analytic flexibility is constrained
Less ambiguity means that others will interpret your plan the same way
The how:
○ Make a detailed plan, when possible pre-register it
○ Have others check you by seeing if they can interpret your plan differently
○ Try out your planned analysis on fake/simulated data
○ Write out the results section without the numbers to see what you will be able to
conclude/interpret
○ Make transparent in your writing which part sticks to the plan, which part deviate
■ Planned part will be more reliable, unplanned prone to chance/bias
Reproducible practices
● Reproducible and transparent
○ Reproducible research refers to the idea that the ultimate product of academic
research is the (transparent!) paper along with the full computational environment
used to produce the results in the paper such as code, data, etc. that can be used to
reproduce the results
, ● Reproducible research practices
○ Improve the quality
○ Speed up scientific discovery
○ Foster greater exchange of ideas among scientists
○ Reduce research waste
● Benefit of open code and data
○ Planning to share data encourages better documentation and thus facilitates
subsequent analyses (ever tried to understand your own work from a while ago?)
■ Writing a detailed syntax can sig. Help you in writing, checking and redoing
your analysis. Pasting all code and comments about what you are doing into
one document enables others to easily check what you did and also for you
to perform your analysis. If you notice a mistake in the beginning, you do not
necessarily need to rerun the whole analysis, but you can just change the
mistake and rerun the rest of the syntax, aka the analysis
○ allows robustness checks (e.g., coding mistakes)
○ Allows independent scientists to repeat the analyses reported by the authors
○ Raw data (individual participant data) can greatly improve meta-analysis
○ Failing to share hides misconduct
○ Prevents the permanent loss of information
○ Sharing data encourages more research (e.g., secondary data analysis)
● Research papers with shared data have fewer errors, those with shared data or shared
code are cited more, and shared data are used in more publications.
● In general, the better documented, and the more standardised, the more reproducible
The How
● Read up on good coding practices
● Be able to apply them in different contexts (e.g. different colleagues you will work
with, different software, etc.)
● Think about data management (-plans) and FAIR data
● Build in checks by others (for reproducibility, ambiguities, mistakes, etc.)
● Be organised.
Different links for more resources on performing responsible and reproducible research:
● https://cos.io/rr/
● https://cos.io/prereg/
● https://www.sciencedirect.com/science/article/abs/pii/S1364661319301846
● https://osf.io/zab38/wiki
● https://www.sciencedirect.com/science/article/pii/S0022103116301925?via%3Dihub