Experimental Research: Summary (2021)
Module 1 - Introduction
Experimental research is important because it is the only way to obtain causal knowledge.
Compared to econometric modelling, experimental research will yield more accurate effect
size estimates which lead to better decision making. Furthermore, experimental research not
only identifies important variables but also ways to improve them through the experiment.
More and more companies seem to realize this, which is why they have started to adopt
experimental research to help them learn, grow and prevent mistakes. Experimental
research can be used for business decisions small and large, but most business progress
happens by a collection of small improvements.
The goal of behavioral research is to describe, explain and predict behavior using
descriptive, correlational and (quasi-)experimental research types. Descriptive research
describes behavioral findings and correlational research investigates the relationships
among variables using correlations. However, correlation does not imply causation: causality
also requires directionality (meaning the relationship is logical) and the elimination of
extraneous variables. Using regression alone, we cannot obtain causal knowledge since we
may not take important confounds into account and we often make other assumptions, such
as a linear relationship. Discovering and understanding relevant confounds is not essential
for making a good prediction, but it helps to explain why it happens (causality) which in turn
helps to make better predictions. Correlations can therefore be seen as descriptive data.
Experimental research aims to obtain causal knowledge by finding changes in behavior as a
result of manipulating variables. Randomization of subjects to treatments and large enough
sample sizes in each condition ensures that each group is on average the same, which
means a difference between groups must be because of the treatment. Randomization is
better than propensity score matching, since it accounts for all possible confounds instead of
the ones identified during the matching process. If it is not possible to manipulate variables,
we use a quasi-experiment.
We do statistical testing in order to find the probability that the difference we found (between
means for example) is actually there (significant), or whether they are just a coincidence
(insignificant). We often draw conclusions based on the p-value, which denotes the
probability of our data (or a more extreme form) given that there is actually nothing going on
(or H0 is true). A marginally significant p-value does not mean there is a small effect, it
means that you are more likely to find your data given that there is in fact nothing going on.
Extra participants will not necessarily increase significance: it may just as well decrease it.
Instead of reporting ‘p < .05’, report it as it is (for example ‘p = .012’) except when p < .001. If
you report a lot of p-values, you can use stars instead. A standard deviation denotes the
variance within a group, denoted by the following formula:
, Outliers will contribute more to the standard deviation than cases close to the mean. In a
normal distribution, 68% of the population lies within one standard deviation, 95% lies within
two standard deviations and 99,7% lies within three standard deviations around the mean.
The independent samples t-test compares the effect of two conditions on a dependent
variable, to test whether there is a significant difference between the two conditions. Several
things contribute to a significant difference between conditions, such as a larger difference
between groups, less variation in each group and a larger sample size. This can be induced
from the formula below. The latter two contribute to more precise estimates.
In statistical programs, we can obtain a p-value once we have a t-value from the t-test. The
effect size denotes how much of the variation we can explain with our treatment. For a t-test,
we use Cohen’s d to obtain the effect size, where the pooled standard deviation is the
weighted average of the standard deviations of both groups. Cohen’s d denotes the variation
between conditions in terms of standard deviations.
We report a t-test as follows: t(degrees of freedom: n - 2) = t-value, p = p-value, d = Cohen’s
d. Try not to interpret Cohen’s d, but let the reader determine its importance: even small
results can be very influential. Cohen’s d is independent of sample size, since it denotes the
effect size. It is better to report Welch’s t-test over Student’s t-test, since the former does not
assume equal variances among groups. A regression can also be used, but many people
find the t-test more convenient since it provides the mean and standard deviation as well.
The assumptions for a t-test and ANOVA (are that (i) the sample is a random subset of the
population (between and within groups), (ii) the dependent variable is at least interval
(ordinal is typically also fine), (iii) data is normally distributed and (iv) variance in the
conditions is roughly equal. The first assumption is most important, the others are not as
long as you have a large enough sample size. Make sure to always inspect your data and
use common sense.
Module 2 - ANOVA
Where a t-test can only handle two conditions, an ANOVA (Analysis of Variance) can handle
more. ANOVAs can also handle more independent variables and more complex designs
(moderation). If we decide to run seperate t-tests instead, we will increase our false positive
rate. In the case of three conditions, it will be (1 - 0.05)^3 = 14,3%. The ANOVA controls for
this increased false positive and also uses all of the available knowledge, which leads to
greater precision. The ANOVA calculates the f-value, which equals the variance caused by
the independent variable, divided by the error variance caused by other factors. Together,
these two components make up the total variance in the model. The f-value tells us that a
significant difference exists in the model (or not), but it does not say what conditions differ.