Lecture 9: Association interval and ordinal variables
When changes in one variable corresponds to similar changes in another variable = positive
correlation → represented by correlation coefficient (r), that has positive value up to a max of 1.
→ correlation doesn’t imply causation (one variable does not cause change in another variable) →
it just measures changes in variables that co-occur.
Correlation of zero when changes in one variable bear no relation to changes in another.
When changes in one variable correspond with opposite changes in another = negative
correlation → represented by correlation coefficient (r), that has a negative value to a minimum
of -1 → fast vs slow, heavy vs light and reflected movements.
→ the size of the correlation coefficient indicates the strength of the
relationship between the two variables.
Association measures for interval and ordinal variables: see picture.
→ smallest correlation is -1 and the largest correlation is +1.
Covariance: example → 5 friends give a movie a score →
second variable is their age → what do we observe when we look at these 2 different
variables? (see pictures left).
- If one variable goes up (score) the other goes down (age) → at
first graph you see for the first friend the score is above average, but the
age is below average → counts for all friends → can also put them in the
same graph (see picture right), with score on y-axis and age on x-axis →
age en score covary → says something about direction of the
association → is a negative association → covariation tells something
about direction, but not yet about the strength of an association.
Example 2: 3 lecturers (A, B and C) that all graded the same assignments.
- Lecturer A: 2, 7, 8, 8, 10 → average grade = 7, standard deviation = 3.
- Lecturer B: 1, 6, 7, 7, 9 → average grade = 6, standard deviation = 3.
- Lecturer C: 3, 7, 5, 7, 8 → average grade = 6, standard deviation = 2.
→ can compare lecturer A and B → B grades assignments one point lower than A → their scores
vary identically (SDs are similar) → means they covary fully and the grades correlate max.
→ can compare C and A → C grades assignments with less variance compared to A → not
identical positions (sometimes C grades higher, sometimes A) → means they do covary, but less
than B covaries with A → grades from C and A correlate, but not max.
Covariance for A and C: lecturer A on x-axis and C on y-axis → coordinate
system through centre of gravity: (𝑥, 𝑦) = (7, 6) → can calculate x-deviations
2
compared to average of x (7) → gives x-dev: 𝑑𝑥 = -5, 0, 1, 1, 3 → 𝑠𝑥 = Σ(dx·
dx)/(n-1) = (25+0+1+1+9)/(4) = 9 (variance of x).
→ covariance is similar, but with x- and y-deviations: Σ(dx·dy)/(n-1) → so we also
need y-deviations: -3, 1, -1, 1, 2 → Σ(dx·dy) = (-5 x -3) + (0 x 1) + (1 x -1) + (1 x 1) + (3 x
2) =21 → cov = 21/4 = 5.25
Covariance is combined variance:
→ can be positive or
negative and is an indication of the correlation → covariance =
left graph = + 5.25, right graph = -2.0 → scale-sensitive →
depends on scale what will be the value of the covariance →
,covariance gives direction (negative or positive) → r = (cov)/(sxsy) = 5.25/(3x2) = 0.875 → r2 = 0.77
(77% linearly explained) → r is not scale-sensitive → this means you can compare the correlation
coefficient rho (r) in different studies → r also indicates whether a correlation is large or small →
summary r:
- r = coefficient of linear association (standardized covariance) → –1 ≤ r ≤ +1 → sign in
front of r shows whether it is a negative/positive correlation.
- r = standardized regression coefficient b in case of simple regression (when you have
only 1 independent variable).
- r2 = proportion variation in y linearly explained by X (covariance is not an association
measure, because it is scale-sensitive).
- Example if r = –0.5 → clearly negative correlation → a 1.0 sx increase in x associates with a
0.5 sy decrease in y → r2 = 0.25, so 25% Y-variation linearly explained by X
- r2 < .09: weak linear association
- 0.09 ≤ r2 < 0.25: medium linear association
- r2 ≥ 0.25: strong linear association
Eta vs r: eta = more general measure for dependency Y on X → eta2 = proportion variation Y
explained by x (see lecture 3) → eta2 ≥ r2 (because r2 is the proportion linearly explained) →
advantage eta: (1) variable X can take on every measurement level and (2) it is a more general
association → disadvantage eta: (1) it is less specific than r (because it has no direction) and (2)
eta Y on X ≠ eta X on Y, so eta is not a symmetrical measure (r = symmetrical measure).
Picture left gives covariance of 2 variables
(consumption of cheese vs number of people that died
by becoming tangled up in their bedsheets) → high
correlation: r = 0.95 → however, this correlation
doesn’t make any sense.
→ you can find correlation and association between
variables that is high, but that doesn’t make any sense → so you also have to base
the selection of variables on existing research and theories.
Rank correlation: use rank correlation measure if; (a) one or both variables are of
ordinal measurement level or (b) with scale variables whereby the trend is not
increasing or decreasing, but curved (see picture right).
→ advantage rank correlation: can use it more general → disadvantage rank
correlation: it is less specific → have 2 different rank correlation measures: (1)
Spearman’s rS and (2) Kendall's tau.
- Rank correlation coefficient of
Spearman's rho (rs): rs is similar to r, but now
we apply it to ranks scores → have scores of
lecturer A and lecturer C (see graph left) with r
= 0.875 → use rank scores, we rank the scores
→ ranking position 1, 2, 3, 4 and 5 → because 3
and 4 have an equal position, so we need to
take the average (3.5 twice) → have to rank
both on y-axis and on x-axis → these rank
scores are used in the calculation.
→ covariance: deviation of x (lecturer A) is multiplied by the deviation of y (lecturer C) → needs
to be divided by the 2 standard deviations → the rs appears to be a little lower than the Pearson
, correlation that was calculated before → calculation is similar, but instead of using original
scores, we use the rank scores.
Kendalls tau (τ): considers all the pairs of points → a pair of point is called a concordant if 1 point
in a pair is higher in terms of the x- and the y-value → if 1 point in pair has both a higher x- and a
higher y-value → concordant when upward direction of arrows (see picture right).
- Number of concordant pairs k+ = 7 (number of arrows in upward direction).
- Number of discordant pairs k- = 1 (number of arrows in downward
direction) → x-value is larger for point to the right, but y-value is
larger for point to the left (for one point x-value is higher and for one
point the y-value is higher).
- Number of neutral pairs = 2 (one pair with same y-value and one with
same x-value).
→ when x- and y-value are similar then it is exactly on the same spot.
tau-a = proportion of concordant - discordant pairs →
→ if we include the neutral pairs you get tau-b and tau-c → don’t calculate this by hand, but
through SPSS → gives tau-b = 0.67 and tau-c = 0.64.
Picture left shows 4 examples of correlations → left top: r = 0.9, so
positive correlation → top right: r = -0.3, so negative correlation and
association is less strong, because the points are more spread and the
slope is less steep → bottom left: rs = higher than r, because the
correlation is a bit curved → bottom right: it is also a curved pattern,
so eta is more suitable.
Correlation in SPSS: Menu <Analyze> <Correlate> <Bivariate...>;
- Tick: Pearson, Kendall’s tau-b, Spearman;
- Select the two variables;
- ‘Test of Significance’: ‘One-tailed’ or
‘Two-tailed’ (dependent on hypothesis);
<OK>
Output: pictures right → 1-tailed so suitable for
directed, which is the case for the grades in the
example, because of the positive correlation → correlation is symmetrical,
because for C and A the Pearson's Correlation is 0.875 and for A and C it is
also 0.875 (is a symmetrical matrix) → for the Kendall’s tau, the p-values are higher, which
means that this measure has less power.
- So for lecturer example: Pearson Correlation = 0.875 (r) and p1 = 0.026 → Spearman’s rho =
0.763 (rs) and p1 = 0.067 → Kendall’s tau_b = 0.667 (tau) and p1 = 0.059.
- r is most extreme (because of the outlier) → values rS and tau are smaller and just not
significant → p for rS and tau are almost similar.
3 correlation tests: to find out the statistical significance.
1. Student distribution: testing H0: ρ = 0 → test statistic: → student
(n-2) distributed → SPSS calculates the t with the exceedance probability p.
2. Spearman's rho: testing H0: ρ𝑠 = 0 → identical formulation, but for r we use rs in the
formula for t.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller yaralangeveld. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $3.77. You're not tied to anything after your purchase.