Here are the answers to all weekly assignments for the PhD course at Maastricht University of Statistics II, focusing on Regression Analysis. With these answers, you will be able to assess your own learning, to see whether you understand what has been learned so far.
1. Correlation
Data file:
Open the SPSS file: colton6re.sav. In this file, age in years (Age) and systolic blood pressure
in mmHg (SBP) are given for 33 women from a Canadian study. An interesting research
question in this study is whether there is an association between age and systolic blood
pressure. Moreover, how strong is this association?
Before you start with the following questions, make your own analysis strategy: what steps
should be taken to answer this research question?
Task 1.1
Make a scatterplot with systolic blood pressure on the y-axis and age on the x-axis to see
whether there are no impossible values and that the possible association between systolic
blood pressure and age is linear.
Question 1a.
Are there any impossible values for systolic blood pressure or for age?
Systolic blood pressure ranges from about 100 to 220 (exact: 99 to 217), which are all
plausible values. Age ranges from about 20 to 80 (exact 22 to 81). Thus, there are also no
impossible values for age.
PhD course Statistics part 2: Regression analysis and SPSS 1
,Question 1b.
Is the association positive or negative? Is there a straight-line association or does it deviate
substantially from a linear line? Why does this latter matter?
Positive association (if age increases, SBP also increases); no clear deviation from linearity,
therefore you may assume linearity. This deviation from linearity matters, since the Pearson
correlation coefficient is a measure of linear association. If this assumption is clearly
violated, one should not use Pearson correlation coefficient on the raw data. A
transformation to make the association linear or use of Spearman correlation (if association
is monotone in- or decreasing) can then be considered.
Question 1c.
Is the association strong or weak? Give an estimation of the correlation coefficient (do not
calculate it).
Whether the association is strong or weak, in other words whether the points are close to a
straight line, is hard to tell. I would say it is a medium to rather strong correlation.
Estimation: points fairly close to a straight line → correlation coefficient between 0.5 and
0.8.
Task 1.2
Compute with SPSS the Pearson and Spearman correlation between age and systolic blood
pressure. Test whether these correlations are significant.
Correlations
Systolic blood
Age in years pressure in mmHg
Age in years Pearson Correlation 1 .718**
Sig. (2-tailed) .000
N 33 33
Systolic blood pressure in mmHg Pearson Correlation .718** 1
Sig. (2-tailed) .000
N 33 33
**. Correlation is significant at the 0.01 level (2-tailed).
Correlations
Systolic blood
pressure in
Age in years mmHg
Spearman's rho Age in years Correlation Coefficient 1.000 .659**
Sig. (2-tailed) . .000
N 33 33
Systolic blood pressure in Correlation Coefficient .659** 1.000
mmHg Sig. (2-tailed) .000 .
PhD course Statistics part 2: Regression analysis and SPSS 2
, N 33 33
**. Correlation is significant at the 0.01 level (2-tailed).
Question 2a.
How large are the Pearson and Spearman correlations? Are these correlations significant,
assuming a significance level (α) of 0.05? Which null hypothesis belongs to these tests?
Pearson: r = 0.718; p = 0.000 (< 0.001), thus significant (smaller than 0.05). H0: ρ = 0
(correlation in population = 0),
Spearman: rs = 0.659; p = 0.000 (< 0.001), thus significant (smaller than 0.05). H0: ρs = 0
(Spearman correlation in population = 0).
Question 2b.
Which correlation (Pearson or Spearman) is preferred? What information is additionally
required to answer this question? Make sure that you obtain this information using SPSS and
try to explain why the Pearson or Spearman correlation is preferred?
The Pearson correlation is preferred if both variables are (approximately) normally
distributed (and the relation is linear, which is checked in question 1). Thus, normality has to
be checked for both variables.
PhD course Statistics part 2: Regression analysis and SPSS 3
, PhD course Statistics part 2: Regression analysis and SPSS 4
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper catherinnaems. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €5,49. Je zit daarna nergens aan vast.