Applied Multivariate Research
6A. Bivariate Correlation and Simple Linear Regression
6A.1 The Concept of Relationship
- Perhaps the most basic question that research is designed to answer, is
whether two variables are related toe ach other
- As a general rule, a correlation coefficient is an index of the degree to
which to or more variables are associated with or related toe ach other,
and a squared correlation coefficient is n index of the strength of that
relationship the procedure used to determine if two variables are related
falls into the domain of vicariate correlation
o It is bivariate because we are addressing the relationship between
two (“bi”) variables.
- Probably the most widely used bivariate correlation statistic is the
Pearson product moment correlation coefficient, called Pearson
correlation r
o It indexes the extent to which a
linear relationship exists between Person Variable X Variable Y
two quantitatively measured A A’s score on X A’s score on Y
variables B B’s score on X B’s score on Y
o With the data in the form shown in C C’s score on X C’s score on Y
the table, the amount of covariation
that exists between the two variables summarizes how the
differences in one variable correspond with the differences in the
other
6A.2 Different Types of Relationships
6A.2.1 Perfect Positive Relationships
- Participants can be depicted by data points. Such a plot is known as a
scatterplot.
- In this example, every Y score is 10 points greater than its corresponding
X score. When we calculate the Pearson r for this set of data points, we
would find it to be +1.00. The positive sign simply means that higher
values on X are associated with higher values on Y. A positive correlation
thus indicates what is known as a direct relationship
- Prediction is founded on correlation; statistically, it is handled through a
procedure called regression.
6A.2.2 Perfect Negative Relationship
- In this example, every gain of 2 points in X is associated with a decrement
of 1 point in Y. In the figure, all data points fall on a straight line. Thus, we
have a perfect relationship. If we calculated the Pearson r, it would turn
out to be -1.00. with this correlation, we could again perfectly predict the
value of Y from a knowledge of how the person scored on X
Pagina 1 van 36
,6A.2.3 Nonperfect Positive Relationships
- We can see that higher scores on X are generally, but not absolutely,
associated with higher scores on Y, and even when higher scores on one
variable are associated with higher scores on the other variable there is
still some variability in the amount of such differences
- Thus, prediction cannot be perfect.
6A.2.4 Absence of Relationship With Variable on Both Variables
- When lower scores on X are not systematically associated with either
lower or higher scores on Y, means that there is no relationship at all
between the two variables
- The lack of any systematic coveriation can be seen in the scatterplot. The
data points appear to be found all over the set of axes.
- When the slope of the ‘line of best fit’ is close to zero, this is a
mathematical indication that the Pearson r would yield a value very close
to zero too
- With a correlation of zero, there is no predictability
6A.2.5 Absence of Relationship with No Variance on One Variable
- When there is no variance on one variable, all data points on that variable
are stacked on a specific value. Without variation on both variables, there
can be no possibility of covariation, so the correlation (the coveriation) is
zero
6A.2.6 Covariation Is Not the Same Thing as Mean Difference
- To say that two variables are significantly positively or negatively
correlated does not provide information concerning whether or not the
means of the variables are significantly different, that is, covariation and
mean difference are separate statistical matters
- Lessons
o Two variables can be highly correlated whether their means are of
equal magnitudes or are quite different in magnitude
o Correlation indexes the degree to which two variables covary or
are in synchrony with each other but does not speak to absolute
differences in magnitude of the values taken on by the variables
o Positive and negative correlations signify direct and inverse
relationships, respectively. The strength of the relationship is
indexed by r2, and so direct and inverse relationship can indicate
relatively weaker or relatively stronger relationships depending
on the absolute value of r
6A.3 Statistical Significance of the Correlation Coefficient
6A.3.1 Interpretation of Statistical Significance
- After obtaining the value of the correlation coefficient from a computation
or a printout, the first thing that most researchers wish to determine is
whether or not the correlation significantly differs from zero
- The null hypothesis is that the correlation between the two variables is
zero in the population, implying that there is no observed relationship
between them
Pagina 2 van 36
, - Statistical significance tells u show confident we can bet hat an obtained
correlation is different from zero.
6A.3.2 Statistical Significance and Sample Size
- The key element in determining whether the correlation is statistically
significant is the sample size on which the correlation is based. This is
because the sampling distribution of r changes with the size of the sample
contributing data to the analysis
- With a sample size as small as 9, we need a Pearson r of about .67 or
better to achieve statistical significance at the .05 level. A Pearson r of
about .38 is the threshold value for significance
- As we work with large rand larger samples, however, we do not need a
particularly high value of the correlation to reach significance, so
statistical significance testing becomes increasingly less important an
issue
6A.4 Strength of Relationship
- Strength of relationship can be thought of in terms of predictive ability
6A.4.1 Guidelines for Assessing Relationship Strength
- Cohen suggested that in the absence of context, one might regard
correlations of 5, .3, and .1 as large, moderate and small, respectively
6A.4.2 Relationship Strength Is Shared Variance
- To say that two variables are correlated or related is to say that they
covary. To say that two variables covary is also to say that they share
variance
- Variables can bear different strengths of relationship to each other.
6A.4.3 Indexing Relationship Strength by r2
- It is possible to quantify the strength of the relationship in a very
convenient way: It is the square of the correlation.
- For the Pearson r, the strength of the relationship is indexed by r2, which
is often called the coefficient of determination. The squared correlation
value can be translated to a percentage
- Given an r2 of .36, we could say that
o The two variables shared 36% of each other’s variance
o The X variable accounted for or explained 36% of the variance of
the Y variable
- The residual variance, indexed by the expression 1-r2, is called the
coefficient of nondetermination
6A.4.4 Alternative Measures of Strength of Relationship
- Although we will use r2 as our index of relationship strength for the
Pearson correlation, other indexes have been suggested. Cohen has
suggested using r itself as the gauge, and Rosenthal has proposed the
ration of r2/(1-r2)
Pagina 3 van 36