Correlation & Regression
Definitions
➔correlation and regression arestatistical techniquesused to examine relationships between
variables
◆ correlation: determines the strength of an associationbetween two quantitative
variables
◆ simple regression: predicts one quantitative dependentvariable from an independent
variable
● dependent variable = y or the criterion
● independent variable = x or the predictor
◆ multiple regression: predicts one criterion from multiplepredictors
Pearson’s Product-Moment Correlation Coefficient (R)
➔ris the coefficient that represents the strengthand the direction of the linear relationship
between two variables
◆ correlation of asample= “Pearson’s r” or just “r”
● lowercase “r” because uppercase R represents multiple regressions
◆ correlation of apopulation= ⍴ (rho)
◆ absolute valueof r determinesstrengthof the relationshipbetween x and y
● the magnitude of the value (regardless of positive or negative)
● r = 0; no correlation
● r = -1 or 1; perfect correlation
○ very rare/highly unlikely
○ means x can perfectly predict y
◆ thesign(positive or negative) determines thedirectionof the relationship
● - value = negative relationship
● + value = positive relationship
➔the values for r or rho arealwaysbetween-1 and1
➔think of r as a way of looking athow closely thedata clusters around the regression line
◆ scatterplots are useful for this (review of scatterplotsin the slides from lecture)
Testing A Correlation
➔to determine whether correlation signifies a sampling error or an actual relationship existing,
you would have to run at-test
, ➔first step= state the hypothesis and the degrees of freedom
◆ for two-tailed test
● H0 = no correlation between the 2 variables (p=0)
● HA = there is correlation between the 2 variables(p≠0)
◆ for one-tailed test
● state the directionality of the correlation
◆ DF= n - 2
● n = the number of points of x and y together
➔second step= state the assumptions
◆ the DV and the IV are both normally distributed
◆ there are no outliers in either the DV or the IV; no bivariate outliers
● bivariate outliers = outliers when considering both the variables together
● correlations are not resistant to outliers, especially when thenis small
◆ the DV and IV are linearly related
● correlations only capture linearity
● no way to actually test this, would just have to look at a scatterplot
○ hence for this class, just state that it’s assumed
◆ the correlation between the two variables must be significant to run the simple
regression
● regression equation is only calculated if the correlation is significant, in other
words if the null is rejected
➔third step= find Pearson’sr
➔fourth step= test the correlation using t-critical
➔fifth step= if the null is rejected (aka significanceis found) calculate the regression equation
Effect Size for R
➔effect size for r = r2
➔similar to Cohen’sd
◆ dtests the magnitude of the difference of the twovariables
◆ rtests the magnitude of the relationship of the twovariables
● the proportion of variance explained
➔r explainshow much of the variancein one of thevariables(y) can be explained by the
2
relationship of it with the other variable (x), therest being attributed to error
◆ i.e. r2 = 0.9025; hence 90.25% of the variance inweight can be explained by the
relationship between age and weight, the rest being attributed to by error
◆ basically looking at the overlap between variable x and variable y
Definitions
➔correlation and regression arestatistical techniquesused to examine relationships between
variables
◆ correlation: determines the strength of an associationbetween two quantitative
variables
◆ simple regression: predicts one quantitative dependentvariable from an independent
variable
● dependent variable = y or the criterion
● independent variable = x or the predictor
◆ multiple regression: predicts one criterion from multiplepredictors
Pearson’s Product-Moment Correlation Coefficient (R)
➔ris the coefficient that represents the strengthand the direction of the linear relationship
between two variables
◆ correlation of asample= “Pearson’s r” or just “r”
● lowercase “r” because uppercase R represents multiple regressions
◆ correlation of apopulation= ⍴ (rho)
◆ absolute valueof r determinesstrengthof the relationshipbetween x and y
● the magnitude of the value (regardless of positive or negative)
● r = 0; no correlation
● r = -1 or 1; perfect correlation
○ very rare/highly unlikely
○ means x can perfectly predict y
◆ thesign(positive or negative) determines thedirectionof the relationship
● - value = negative relationship
● + value = positive relationship
➔the values for r or rho arealwaysbetween-1 and1
➔think of r as a way of looking athow closely thedata clusters around the regression line
◆ scatterplots are useful for this (review of scatterplotsin the slides from lecture)
Testing A Correlation
➔to determine whether correlation signifies a sampling error or an actual relationship existing,
you would have to run at-test
, ➔first step= state the hypothesis and the degrees of freedom
◆ for two-tailed test
● H0 = no correlation between the 2 variables (p=0)
● HA = there is correlation between the 2 variables(p≠0)
◆ for one-tailed test
● state the directionality of the correlation
◆ DF= n - 2
● n = the number of points of x and y together
➔second step= state the assumptions
◆ the DV and the IV are both normally distributed
◆ there are no outliers in either the DV or the IV; no bivariate outliers
● bivariate outliers = outliers when considering both the variables together
● correlations are not resistant to outliers, especially when thenis small
◆ the DV and IV are linearly related
● correlations only capture linearity
● no way to actually test this, would just have to look at a scatterplot
○ hence for this class, just state that it’s assumed
◆ the correlation between the two variables must be significant to run the simple
regression
● regression equation is only calculated if the correlation is significant, in other
words if the null is rejected
➔third step= find Pearson’sr
➔fourth step= test the correlation using t-critical
➔fifth step= if the null is rejected (aka significanceis found) calculate the regression equation
Effect Size for R
➔effect size for r = r2
➔similar to Cohen’sd
◆ dtests the magnitude of the difference of the twovariables
◆ rtests the magnitude of the relationship of the twovariables
● the proportion of variance explained
➔r explainshow much of the variancein one of thevariables(y) can be explained by the
2
relationship of it with the other variable (x), therest being attributed to error
◆ i.e. r2 = 0.9025; hence 90.25% of the variance inweight can be explained by the
relationship between age and weight, the rest being attributed to by error
◆ basically looking at the overlap between variable x and variable y