Advanced Research Methods and Statistics for Psychology 2023
Grasple (excluding Jasp)
Refreshing Statistics
Simple linear regression
What do we mean with the word simple linear regression
→ The word simple indeed refers to the number of predictors (independent=predictor)
SO, → dependent = outcome variable
Pearson’s r (correlation coefficient) = strength of a linear regression (two variables)
The correlation is a standardized measure, and multiple strengths of relationships can be compared
because of that.
However, a low correlation or a correlation of 0 does not mean that there is no relation between the two
variables. The relationship can also be non-linear.
So, keep in mind that Pearson's r measures the strength of the linear relation, irrespective if the real
relation is indeed linear.
Example: The plot on the right shows a strong relation between x and Y, but it is a quadratic relation. The
best fitting linear line will be horizontal, and thus tell us that there is no linear (positive or negative) relation.
Lesson: do not trust on numbers only (e.g. r=0) but make plots and inspect your data.
A correlation does not mean that the movement in one variable causes the other variable to move as well
Summary
• A correlation is a standardized measure of the strength of the linear relationship between two
variables.
• A correlation is scaled to always be between -1 and 1.
• A high positive correlation means that when one variable increases, the other one also increases.
• A high negative correlation means that when one variable increases, the other one decreases.
• A correlation of 0 means that when one variable increases, that has no linear influence on the
other variable
• A correlation of 0 does not mean that there is no relationship between the two variables, it could
be a non-linear relationship.
• A correlation does not say anything about the causal effects of the variables.
Correlation and causality
Variables like; Yes or no, Blue, Yellow and red are NOT suitable for measuring relationships with Pearson
correlation and/ or linear regression
(only numeric like ratio and interval)
If you quickly want to know whether there is a relationship between variables
,→ make a scatter plot (this will provide valuable information about the strength and direction of the
relationship
Compare correlation??
Use Pearsons r (it is always between -1 and 1)
(however pearson is not handy when it is a non-linear regression (see below))
They erroneously took correlation to mean causation. Just because two variables are correlated, this does
not mean that one causes the other.
Whether one variable is the cause of the change in the other variable cannot be concluded based on a
correlation. To check this, you would need to set up an experiment.
You need an experiment so OTHER explanations can be RULED OUT
What is true:
If there is a correlation the one variable influences/ effects the other variable
Summary
• It is a common mistake to interpret a correlation between two variables as one variable causing a
change in the other.
• Be precise when reporting your conclusions based on a correlation. Otherwise people might
misquote your findings later on.
Linear regression
As you know, relations can be divided into two categories: linear relations and non-linear relations.
In essence, linear regression boils down to summarizing a bunch of data by drawing a straight line through
them.
Minimal level required to make a linear regression is Interval (you need interval or ratio variables)
You only perform a linear regression if the relation is linear
The definition of a slope: how much Y increases if X increases by 1
The intercept is the point where the regression line crosses the y-axis.
Note: the intercept is also often referred to as the "constant" or b0.
Now that we know the line's two essential components, we can use these to make predictions:
Y-value=intercept + slope × X-value
Mathematically we tend to use the following symbols to write this formula:
y^=b0+b1x
Note: The hat on y is used to denote that this is not the observed y-score but the predicted y-score.
,There are occasions where the intercept of a regression equation is actually meaningful and informative.
Take the relation between hours studied and exam grade. The intercept is the predicted grade for someone
who studied 0 hours.
However, in many other occasions the intercept by itself can be fairly meaningless and only serves
(mathematically) to support a correct prediction.
In the relation between height and weight, the intercept tells you the predicted weight of someone who is 0
cm tall. As you can see, the interpretation of this intercept is non-sensical.
Intercept means = when Y is …. X = 0
Summary
In this lesson you learned:
• Linear regression is an analysis in which you attempt to summarise a bunch of data points by
drawing a straight line through them
• Linear regression requires variables at interval/ratio level
• Linear regression should only be performed on linear relations
• The regression equation can be written as: y^=b0+b1x
• b0 refers to the intercept, the point where the line crosses the y-axis and is interpreted as: if X is
0, y^ is b0.
• b1 refers to the slope of the line and is interpreted as: if X increases by 1 unit, y^ increases by b1
units (and if b1 is negative read decreases).
Estimating the regression line
The distance between the true value y and the predicted value y^ is called the error or residual.
Sum of errors is sometimes zero because of the negative and positive error dots
So,
When we square the errors, they will always be positive and they do not cancel each other. This way we
can look for the line that will result in the smallest possible sum of squared errors.
This method is called the least squares method → Σ(y−y^)2
This is used to estimate the parameters of the linear regression model
This resulted in the following formula, that determines the slope of the line with the smallest sum of
squared errors:
, So the slope equals the correlation coefficient (Pearson's r) times the standard deviation of y divided by
the standard deviation of x.
You do not need to know this formula but it is good to know that these 3 ingredients play a role.
Summary
• A regression line never fits all the data points perfectly. There will be residual error.
• This residual error is the difference between between the observed score y and predicted score y^
: y−y^
• The estimated regression model is based on minimizing the sum of the squared errors, Σ(y−y^)2.
• This so-called least squares principle provides formula's for how to compute the slope and
intercept of the best fitting linear regression line. JASP (or other statistical software) provides
these estimates for you.
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Studeerzee. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €7,06. Je zit daarna nergens aan vast.