Week 1:
JASP Assumptions in MLR:
For this lesson, the LifeSat1.sav datafile will be used, which can be found on
Blackboard. The data in this file is used to address the research question:
Which variables predict how satisfied young people are with their lives?
For this research, data was collected from 98 randomly selected young people
through questionnaires.
Within this datafile are the following variables:
Satisfaction: measured with the Life Satisfaction Scale (1-100)
Age: measured in years
Gender: (0 = male, 1 = female)
Sports: sport participation measured in number of hours per week
Parents: support from parents (scale of 1-10)
Teachers: support from teachers (scale of 1-10)
SES: socio-economic status (1 = low, 2 = medium, 3 = high)
Later in this lesson, we will perform a multiple regression in which we predict
satisfaction based on age, gender and sports participation.
First, we examine how to produce JASP output necessary to check assumptions
before performing the regression analysis. We will look at the output and interpret
it in a next lesson.
We will first examine 2 assumptions about the relations between each predictor (x)
and the outcome (y):
Assumption: there are linear relationships between the dependent variable and all
continuous independent variables.
A linear relation means that the scatterplot of scores (a plot with a predictor on the x-
axis and the outcome on the y-axis) has an oval shape that can be described
reasonably well by a linear line (i.e., not a curved or s-shaped relationship).
Assumption: there are no outliers in the relation between x and y.
An outlier is a case that deviates strongly from other cases in the data set. This can be
on one variable (e.g. everybody in the data has values between 20-25 on this variable
but one person scored 35), on 2 variables (e.g. one dot in the scatterplot is far
outside the oval cloud that contains the other dots), or on a combination of even
more variables (then numerical instead of visual inspection is easier). Since we are
now focusing on the x-y relation (for each x separately), we will start by creating
scatterplots in JASP.
We will first produce scatterplots in JASP, each plot with a different (continuous)
independent variable on the X-axis and the dependent variable on the Y-axis.
Not sure how to create scatterplots in JASP? Follow these steps!
With the data open, click on Descriptives in the top tab.
1
, Add the variables you want to look at into the Variables box.
Scroll down to Plots, and tick Correlation plots in Basic plots.
2
, That's it!
Which variables did you include in the descriptives?
Satisfaction, age, gender, sports:
The plots show an outlier (check plot age with satisfaction; one respondent is much
younger than all others).
The next exercise is removing the outlier from the dataset.
Then, we will produce another scatterplot between age and satisfaction and compare
the old scatterplot of age and satisfaction with the new one (without the outlier).
Don't know how to remove an outlier? Follow these steps!
Go back to the data by clicking the arrow along the left-hand side of the screen.
3
, Then, you can double click on the data to edit it. However, for the next steps it
matters what file extension the data in JASP has. For .csv files, excel will open and
you can continue reading instructions below.
For .sav files (as we have here), SPSS will open when you double click the data. This is
fine and you can remove the outlier in SPSS and save the data again. BUT: perhaps
you do not have SPSS (which is perfectly oke) and then this does not work. In that
case, you need to create a csv file first, which can be done in JASP using Export data in
the main menu (the three horizontal blue lines on the left); choose a location; and
save as Lifesat1.csv. Now you can open this file in excel.
You may have the following message pop up:
Click Generate Data File and choose an appropriate folder to save it. Once you've
saved it, it should open automatically! If not, just go to the folder you've saved it into
and open it manually.
Don't worry if your data looks like this:
4
, That is because of how the file is saved as a CSV by default, but depending on your
machine, excel might read it differently. You can just go ahead and delete the row
which has the outlier (if you have trouble identifying which one it is, you can look at
the row number in JASP (e.g. here, it is row 1) and delete the corresponding row in
excel (i.e. the row number in JASP+1, so here, 1+1=2).
Save the file - JASP will update the data it has automatically, you don't need to do
anything more!
Now that you've deleted the outlier, look at the scatterplots again.
Do your plots related to age look like this?
Great! We have checked some assumptions before fitting the model. Let us now fit
the model and check the rest of the assumptions!
We are going to perform a multiple regression on this data set, in which we predict
satisfaction based on age, gender and sports participation.
We will first do so using the frequentist framework.
Conduct this analysis in JASP:
Regression > Classical > Linear regression
Consider the dependent and independent variables: put them in the right boxes. This
in itself would be enough to produce regression output - however, further
5
, assumptions must be checked! We need to change some of the settings so that we
can check them.
Tip: you can check the analysis info by clicking on the blue "i" in the top right corner!
This will show you information about the kind of model this is and the different
settings you can use!
Let's now look at which settings need to still be changed.
We wish to check the following assumptions:
Absence of multicollinearity: scroll down to Statistics and tick Collinearity
diagnostics
Absence of outliers: scroll down to Statistics and tick Casewise diagnostics, then
select either Standard residual (default value 3 is okay, or can be changed to also
often used 3.3) or Cook's distance (the default value is 1 and can stay as such). This
will produce a table with only the observations that exceed those values.
Note: you can only look at Standard residuals or Cook's distances one at the time.
Make sure you have a look at both one after the other!
Homoscedasticity: scroll down to Plots and tick Residuals vs. predicted
Normally distributed residuals: scroll down to Plots and tick Residuals
histogram and Q-Q plot standardized residuals.
6