Summary Grasple lessons ARMS
Grasple week 1 Bayes and MLR
The Bayesian framework is based on the posterior distribution of one or more parameters. The
posterior is a combination; prior + likelihood
the information in the data set gives information, for what logical values for μ (the mean) could
be: likelihood function
We also have the knowledge and beliefs about μ, before examining the data: prior distribution
Here on the left are examples of prior
distributions, which gives the chances for
some values (e.g. on the left, all is equally
likely)
(and the posterior is thus a value between de μ of the likelihood and the prior)
By using the prior, you don’t start your research from scratch and so science can accumulate.
(Prior can also be seen as a bottleneck, because it can affect the results)
You as a researcher decide what kind of previous knowledge you want to add
Definition of probability:
In classical/ frequentist statistics: the frequency with which it occurs
In Bayesian statistics: is it based on Bayes theorem, where conditional probabilities are
central. These probabilities are about e.g. P(A given B); ‘what is the probability of A given
that B has happened or is true’. If we fill in that A
stands for a hypothesis of interest and B for data we
collected, then P(A given B) represents the probability
of our hypothesis given the data we observed in our
study.
(to get P(A|B), we need P(A); the prior probability of the hypothesis)
A frequentist interval is called a confidence interval. A Bayesian interval is called a credible interval
with confidence interval: "If we were to repeat this experiment many times and calculate an
interval each time, 95% of the intervals will include the true parameter value (and 5% does not)"
with credible interval: "There is 95% probability that the true value is in the interval."
P-value (frequentist): the probability of observing the same or more extreme data given that the null
hypothesis is true.
A Bayesian probability also gives information about how likely a hypothesis is, given the observed
data. They measure a relative support, using the Bayes Factor.
(e.g. A BF12 of 10 means that the support for H1 is 10 time stronger than the support for H2)
A BF is not a probability but BFs can be transformed into (relative) probabilities.
First we have to define prior model probabilities: i.e., how likely is each hypothesis before seeing the
data. You could consider all hypothesis chanced equally, so H1=H2=0,5 or H1=H2=H3=0,33 (the
probabilities add up to 1 (also when the chances are not equal)). Adding up to 1 applies to prior
probabilities and posterior model probabilities (PMP)
(when BF12=3, chances of H1=0,75 and H2=0,25)
,Assumptions about MLR (multiple linear regression):
About the measurement level of variables in MLR:
The dependent variable is a continuous measure (Interval or Ratio)
The independent variables are continuous or dichotomous (nominal with two categories)
About linearity of relations:
There is linear relationships between the dependent variable and each of the continuous
independent variables. (This can be checked using scatterplots, with the (continuous)
predictor on the x-axis and the outcome on the Y. The shape has to be oval (not S or curved)
to meet this assumption, then the relationship can best be described with a straight line)
Other:
No outliers (a case that deviates strongly from other cases in the data set, which can be on
variable, or multiple)
When you haven’t got a linear relationship, you can additional terms to the regression model, to
accommodate the non-linearity. If the shape is for example a curve, the quadratic relation may be
better than the linear one, so you can add a new variable (the squared version of the original X and
you run the regressing with both variables (X and X 2). You get two parameters (B1 and B2) out of
this, where:
- B1: informs you about the steepness of the overall
slope (the linear trend in the curved relation). The p-
value when testing B1 informs you whether the linear
trend is zero (horizontal) or not (when p<.05)
- B2: informs you about how curved the relation is, it measures the change in slope with increasing X.
The p-value when testing B2 informs you whether the change in slope is significantly non-zero. It
basically tells you if the quadratic relation is a better model for your data than the linear relation.
If you have an outlier, sometimes you can change it (e.g. you made a typo), or delete it, or make it
less extreme (e.g. by doing mean+2*SD). But all in all, transparency is import
It is important to visualize your data, because the influence of a violated model assumption on the
results can be severe.
Anscombe Quartet: describes four data sets that have several equal statistical properties
We want to check various assumptions:
Absence of outliers (multivariate, so for combinations of all variables in the model)
determine with histogram or boxplot, when one variable and with scatterplot when 2
variables.
Multivariate (for all variables in the model), this can be assessed whilst performing the
analysis.
On the basis of the values of the Casewise diagnostics, Standard residuals and Cook’s
Distance, it is possible to assess whether there are outliers in the Y-space and XY-space,
respectively
With standardized residuals we check whether there are outliers in the Y-space. As a rule of
thumb, it can be assumed that the values must be between -3.3 and +3.3. Those smaller than
-3.3, or greater than +3.3, indicate potential outliers.
With Cook’s Distance it is possible to check whether there are outliers within the XY-space.
An outlier in the XY-space is an extreme combination of X (all X-variables) and Y scores.
, Cook’s distance indicates the overall influence of a respondent on the model. As a rule of
thumb, we maintain that values for Cook’s distance must be lower than 1. Values higher than
1 indicate influential respondents (influential cases).
When you have to make a choice about whether or not to remove an outlier, a number of things can
be helpful:
Does this participant belong to the group about which you want to make inferences?
o If not, do not include the participant in the analysis.
Is the extreme value of the participant theoretically possible?
o If not, do not include the participant in the analysis.
o If so, you could run the analysis with and without the participant and compare
results.
When changing the data, you have to do it for the good reasons (not p-hacking) and be transparent
about it. Transparency is also reached by preregistration
Absence of multicollinearity
Multicollinearity indicates whether the relationship between two or more independent
variables is too strong.
Association between predictors is not a problem for MLR, but very large association (r
above .8 /.9) is.
If you include overly related variables in your model, this has three consequences:
o The regression coefficients (B) are unreliable,
o It limits the magnitude of R (the correlation between Y and Ŷ),
o The importance of individual independent variables can hardly be determined, if at
all
Determining whether multicollinearity is an issue can be done on the basis of the statistics
Tolerance or VIF (Variance Inflation Factor). You can use the following rule of thumb:
o Values for the Tolerance smaller than .2 indicate a potential problem.
o Values for the Tolerance smaller than .1 indicate a problem.
o The variance inflation factor (VIF) is equal to 1/Tolerance. So for the VIF, values
greater than 10 indicate a problem.
When you run into multicollinearity, you have to remove the variables that cause the
problem (which are highly correlated) or combine variables in a scale (e.g. using factor
analysis)
Homoscedasticity
Homoscedasticity is that the
spread of the residuals must be
approximately the same across all
values for the predicted y. We
check this by plotting the
(standardized) residuals against
the (standardized) predicted
values. If the spread is equally
distributed for every predicted value (X-axis) there is approximately the same amount of
spread around the Y-axis), you have homoscedasticity, otherwise heteroscedasticity.