Advanced Research Methods and Statistics for Psych
All documents for this subject (1)
Seller
Follow
Studeerzee
Content preview
Advanced Research Methods and Statistics for Psychology 2023
Grasple (excluding Jasp)
Refreshing Statistics
Simple linear regression
What do we mean with the word simple linear regression
→ The word simple indeed refers to the number of predictors (independent=predictor)
SO, → dependent = outcome variable
Pearson’s r (correlation coefficient) = strength of a linear regression (two variables)
The correlation is a standardized measure, and multiple strengths of relationships can be compared
because of that.
However, a low correlation or a correlation of 0 does not mean that there is no relation between the two
variables. The relationship can also be non-linear.
So, keep in mind that Pearson's r measures the strength of the linear relation, irrespective if the real
relation is indeed linear.
Example: The plot on the right shows a strong relation between x and Y, but it is a quadratic relation. The
best fitting linear line will be horizontal, and thus tell us that there is no linear (positive or negative) relation.
Lesson: do not trust on numbers only (e.g. r=0) but make plots and inspect your data.
A correlation does not mean that the movement in one variable causes the other variable to move as well
Summary
• A correlation is a standardized measure of the strength of the linear relationship between two
variables.
• A correlation is scaled to always be between -1 and 1.
• A high positive correlation means that when one variable increases, the other one also increases.
• A high negative correlation means that when one variable increases, the other one decreases.
• A correlation of 0 means that when one variable increases, that has no linear influence on the
other variable
• A correlation of 0 does not mean that there is no relationship between the two variables, it could
be a non-linear relationship.
• A correlation does not say anything about the causal effects of the variables.
Correlation and causality
Variables like; Yes or no, Blue, Yellow and red are NOT suitable for measuring relationships with Pearson
correlation and/ or linear regression
(only numeric like ratio and interval)
If you quickly want to know whether there is a relationship between variables
,→ make a scatter plot (this will provide valuable information about the strength and direction of the
relationship
Compare correlation??
Use Pearsons r (it is always between -1 and 1)
(however pearson is not handy when it is a non-linear regression (see below))
They erroneously took correlation to mean causation. Just because two variables are correlated, this does
not mean that one causes the other.
Whether one variable is the cause of the change in the other variable cannot be concluded based on a
correlation. To check this, you would need to set up an experiment.
You need an experiment so OTHER explanations can be RULED OUT
What is true:
If there is a correlation the one variable influences/ effects the other variable
Summary
• It is a common mistake to interpret a correlation between two variables as one variable causing a
change in the other.
• Be precise when reporting your conclusions based on a correlation. Otherwise people might
misquote your findings later on.
Linear regression
As you know, relations can be divided into two categories: linear relations and non-linear relations.
In essence, linear regression boils down to summarizing a bunch of data by drawing a straight line through
them.
Minimal level required to make a linear regression is Interval (you need interval or ratio variables)
You only perform a linear regression if the relation is linear
The definition of a slope: how much Y increases if X increases by 1
The intercept is the point where the regression line crosses the y-axis.
Note: the intercept is also often referred to as the "constant" or b0.
Now that we know the line's two essential components, we can use these to make predictions:
Y-value=intercept + slope × X-value
Mathematically we tend to use the following symbols to write this formula:
y^=b0+b1x
Note: The hat on y is used to denote that this is not the observed y-score but the predicted y-score.
,There are occasions where the intercept of a regression equation is actually meaningful and informative.
Take the relation between hours studied and exam grade. The intercept is the predicted grade for someone
who studied 0 hours.
However, in many other occasions the intercept by itself can be fairly meaningless and only serves
(mathematically) to support a correct prediction.
In the relation between height and weight, the intercept tells you the predicted weight of someone who is 0
cm tall. As you can see, the interpretation of this intercept is non-sensical.
Intercept means = when Y is …. X = 0
Summary
In this lesson you learned:
• Linear regression is an analysis in which you attempt to summarise a bunch of data points by
drawing a straight line through them
• Linear regression requires variables at interval/ratio level
• Linear regression should only be performed on linear relations
• The regression equation can be written as: y^=b0+b1x
• b0 refers to the intercept, the point where the line crosses the y-axis and is interpreted as: if X is
0, y^ is b0.
• b1 refers to the slope of the line and is interpreted as: if X increases by 1 unit, y^ increases by b1
units (and if b1 is negative read decreases).
Estimating the regression line
The distance between the true value y and the predicted value y^ is called the error or residual.
Sum of errors is sometimes zero because of the negative and positive error dots
So,
When we square the errors, they will always be positive and they do not cancel each other. This way we
can look for the line that will result in the smallest possible sum of squared errors.
This method is called the least squares method → Σ(y−y^)2
This is used to estimate the parameters of the linear regression model
This resulted in the following formula, that determines the slope of the line with the smallest sum of
squared errors:
, So the slope equals the correlation coefficient (Pearson's r) times the standard deviation of y divided by
the standard deviation of x.
You do not need to know this formula but it is good to know that these 3 ingredients play a role.
Summary
• A regression line never fits all the data points perfectly. There will be residual error.
• This residual error is the difference between between the observed score y and predicted score y^
: y−y^
• The estimated regression model is based on minimizing the sum of the squared errors, Σ(y−y^)2.
• This so-called least squares principle provides formula's for how to compute the slope and
intercept of the best fitting linear regression line. JASP (or other statistical software) provides
these estimates for you.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Studeerzee. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.56. You're not tied to anything after your purchase.