Critical thinking about psychological research
If we are uncertain about Psychological processes and we have to test out whether
Psychology/the world works the way we expected it to → you would not expect to predict
96% of the times → so no need for empirical evidence… silly stuff
Fraud in Psychology
How to explain constant accurate predictions from scientists or illogical published results?
- Fraud in psychology→ making up the research results (saying what you consider to
be the truth)
Scientist was not motivated by getting to the truth but by making publications
➢ Failed replications (replication studies that fail to result in the original
outcome of the study… we are uncertain as to how robust is the phenomena
studied)
■ Replication studies might point to different results and conclusion
than the original study
Issues:
With Replicability
Different data set, different conclusions
★ Direct replication→ collect new data, study is the same, sample is different
★ Conceptual replication→ concepts are the same, study is different, sample is
different, different data, different method, different analysis
- Sometimes original studies have significant results but replication studies do
not.
- Estimated effect size on average in replications is a lot lower than what the
original study predicted.
- Original study vs.replication studies → no match
With Robustess (lack of it)
Same data set, different analysis, different results
● Giving the same data set to different researchers to see if they come to the sample
conclusions… no new sample… different analysis used
○ Depending on the choices scientists made with data (different
considerations, assumptions), is what the finding is (no effect, effect) → a lot
of variability (shows science is not a mechanical act).
○ Less robustness as there is a whole range of different outcomes.
With Reproducibility
Same data set, same data analysis, different results
If you give people the same analysis protocol (get the data and the code), even in
that case people get to different conclusions
P-values→ errors in how they are reported
, More likely to be found when the results are significant… nonsignificant
results are more accurate
Researchers degrees of freedom (ways in which researchers may vary in their approach to
making research decisions)
1. Research area, theory, hypotheses
2. Dependent variables, conditions
3. Measurement procedure
4. Participants
5. Analysis and outliers
6. Effect and relevant effects
7. Conclusion from analyses
8. Conclusion from investigations
Best approaches for research as a scientist
Don't fool yourself→ be honest
Bend over backwards to show how you are maybe wrong→ be aware that your
reasoning is faulty
Making mistakes for all to see in hope of corrections → hope others to help with the
corrections
*If you leave out enough, everything can become a story→ publication bias
Ideally→ the whole process
hinges in the quality of the
research and the quality of
what you have written
Reality→ quality is not all
that matters… The results
play a big role in the process
of publication…
overemphasis on getting
significant results
,Ideal vs. Real
→ Two same studies, everything is the same except for the results:
- significant vs. non significant
- publisher was more critical (negative feedback) on non significant results
(publisher did not even finish reading the discussion section)
- It is more likely to get positive feedback and get published if positive
results are shown
→ Non-significant results are seemed as if they will not contribute to the field → mindset
of significant results being the only ones that contribute
→P-hacking: Manipulating the data to find significant results
- Round p-values
- Changing the way you do the analyses
- Manipulate your the trend (positive, negative)
- Fail to report conditions
Publication & reporting bias → overemphasis on positive (significant) results, failed trials
are not reported… conclusion on something might not be accurate.
Example of effect of antidepressants on people:
● Pre-register trials shows that there is a 50% of positive outcome and a 50% of
negative outcome→ study publication bias happens and there is some filtering and
more positive outcomes are shown (from 50% to 25% of failure of antidepressants)
→ outcome reporting bias happens and not everything done is reported→ spin
happens and there is an overemphasis on positive results→ citation bias happens
and only the significant results are reported (it is very likely for the antidepressants
to work)
, Why use statistical tests?
➢ Inferential statistics (infer about population from data)
➢ What led to results.., the underlying process (random variation, systematic variation,
both?)
Statistical reasoning
❖ How likely is a test statistic if there is no difference in the population → p-value
Practical reasoning
Decisions are made using the p-value→ p-values are being compared with alpha level
- Minimize the type I error by 5% (alpha) and type II error by 20% (beta)
- Not reasoning about what is true, you are acting in a specific way in a
justified way
● We see type I error as more of an issue (alpha 0.05)--> more costly to reject the
null when the null is true (because 5% is lower than the 20% of type II error)...
Researchers are more concerned with minimizing Type I error
- Because of costly outcome → e.g. discovering a pill that might save
someone’s life when it is not the case, founding a process and spending
money and not being effective
Why are research practices questionable? Has to do with error rate and alpha level
P1: If I throw a 20-sided die, then the probability that I throw a 20 is 5%
P2: I throw a 20-sided die
----------------------------------------------------------------------------------
C: The probability of me throwing a 20 is 5%
P1: If I throw a 20-sided die, then the probability that I throw a 20 is 5%
P2: I throw with 100 20-sided die
----------------------------------------------------------------------------------
C: The probability of me throwing a 20 is 5%
*It happens with questionable research practices
- Random selection and assignment are taken as a subset of the population… stuff
could show up in the sample that does not exist in the population
- If you add more:
- As the number of comparisons between groups and number of
variables that you make on increase, the risk of finding a statistical
fluke/random difference increases accordingly
- Questionable: comparisons between groups are sometimes
dropped because there are non-significant differences
between them→ you should not! It is an opportunity for
randomness to show itself