Summary Inferential Statistics Test 2
Pre-master Communication Science University of Twente 2020
Britt Heuvel
Harry van der Kaap
1
,Content Inferential Statistics
Descriptive statistics describes data and inferential statistics allows you to make predictions (“inferences”) from
that data. With inferential statistics, you take data from samples and make generalizations about a population.
Assessment
The final mark for Inferential Statistics consists of three parts.
1. Part I: Introduction to Inferential Statistics (Lectures 1 till 7)
• Written exam (open questions)
2. Part II: More about Inferential Statistics (Lecture 8 till 14)
• Written exam (open questions)
3. Weekly assignments
The assignment for Inferential Statistics is divided into weekly assignments. These assignments are only accepted
as a valid attempt if it is a serious attempt. Not the points but the effort is relevant (a serious attempt can lead to
a low rating but is still accepted). There is always a limited time available per assignment
• For the assignments there is only a resit for which you do not get points (zero points). This “resit” will
take place for each subpart via the second submission date. That retake is only assessed as sufficient if
it is a serious attempt.
• An important aspect of the assignments is to assess the practical part of Inferential Statistics. No
assignments: no exam. You must submit all parts of the assignment to gain access to the written exam.
For the complete assignment (parts A to E) you can receive a maximum of 100 points. Results for a
partial exam (assignments and or exams), are not valid in a following period.
Sub-part Points Available after lecture: Hand-in (for points) before Resit
Part A 20 2 (Thursday 12-11-2020) Friday 20-11-2020 22:00 PM 30-11-2020
Part B 20 4 (Thursday 26-11-4-2020) Monday 30-11-2020 10:00 AM 7-12-2020
Part C 20 6 (Thursday 3-12-2020) Monday 7-12-2020 10:00 AM 14-12-2020
Part D 20 10 (Thursday 7-1-2021) Monday 11-1-2021 10:00 AM 18-1-2021
Part E 20 12 (Thursday 14-1-2021) Monday 18-1-2021 10:00 AM 25-1-2021
You can receive 100 points for each of the three parts. Both exams count for 40% of the final mark and the total
of the assignments for 20%. The final mark for Inferential Statistics must be at least 55 points (= 5.5). The mark
for both written exams must be at least 50 points (= 5.0) and the assignment must in any case be handed in as a
serious attempt. The individual assignment is mandatory.
Literature:
Stats: Data and Models – De Veaux, R.D., P. F. Velleman and D.E Bock (2016)
On the exam you receive a formula sheet plus some tables. For the exam you need a calculator, for the
Assignments you need the program SPSS
Planning
2
, Summary Inferential Statistics
Test 2: Lecture 8 t/m 13
Main subject exam (5 Questions, total time 2.30 hours):
• Non-parametric test: Wilcoxon Rank Sum Test + Wilcoxon signed rank test
(by hand, not via SPSS).
• More than two means: One Way ANOVA: assumptions + interpretation +
evaluation (via SPSS output) + ANOVA table. The non-parametric alternative
for more than two means (Kruskal-Wallis test)
• Regression, using the model: Confidence intervals (CI) versus prediction
intervals (PI) for a specific value of the independent variable(s). Only via SPSS
(= Scatterplot and or saved values for CI and PI).
• Two Way ANOVA: ANOVA table, Interpretation Main effects and Interaction
effects, investigating assumptions based on SPSS output.
• Multiple regression (via output from SPSS): interpreting and evaluating the
model, test + CI for the different independent variables, interpreting the
residual analysis and the ANOVA table for multiple regression.
• Assumed knowledge based on Part I (Type I, Type II and Power of the test /
critical value versus P-value / Levene's test / Shapiro-Wilk test)
3
,Lecture 8: Non-parametric tests
When do we use a certain non-parametric test? If assumptions are not fulfilled (very skewed, low n, etc.)
There are a lot of non-parametric tests, we focus on two important non-parametric tests:
• Wilcoxon Sign Rank test (= Related/ paired samples)
• Wilcoxon Rank Sum test (= Two independent samples)
• Kruskal Wallis test (more in lecture 11)
What in general does the non-parametric alternative test? They are ranking the data (low to high). Advantage:
extreme values don’t influence the average ranking, but disadvantage: you throw away detailed information
about the real value of the scores. Non-parametric alternatives are a limited method, and the power of the test is
lower, but they are more valid if the t-test assumptions are not fulfilled. We have a distinction in four
measurement levels (nominal, ordinal, interval & ratio), if you have only ordinal measurement level (already a
ranking) you can also use already these non-parametric alternatives. Non-parametric tests not via SPSS.
Quantitative data (I) 𝑵(𝝁, 𝝈)
Two or more than two means
One mean
Paired Samples Independent Samples
= One Sample Problem = Two Samples Problem = More than Two Samples
# ANOVA
𝜎! = 𝜎" 𝜎! ≠ 𝜎" one factor (one-way) &
One Sample t-test
Pooled t-test Two Sample t-test multifactor ANOVA +
interaction effects
If assumptions are not fulfilled à non-parametric alternative:
(Sign test) For paired samples:
Rank sum test Wilcoxon Kruskal Wallis test
Signed Rank test Wilcoxon (W+)
Overview Test and Procedures (till now)
Two-variable (bivariate) procedures
Measurement
level of first Single variable
Measurement level of second variable (independent)
variable procedures
(dependent) Interval
Dichotomy Nominal Ordinal
and Ratio
Proportions, Percentages
Dichotomy Epsilon
- CI one proportion
Mode, number
Nominal Cramer’s V Cramer’s V Cramer’s V
of categories
Median, Quartiles, Kendall’s tau-b
Wilcoxon
Ordinal Deciles, IQR, Range Cramer’s V Kendall’s tau-c Spearman’s rho
Rank Sum test
- Wilcoxon Sign Rank test Spearman’s rho
Mean, Skewness, - General two
Correlation
Standard Deviation samples t-test
Interval Regression
- One sample t-test - Pooled t-test Cramer’s V Spearman’s rho
and Ratio Spearman’s rho
- Paired + CI one mean - Levene’s test
Pearson’s R
- Shapiro-Wilk test - CI difference
4
,Non-parametric test (distribution free tests) 1:
Wilcoxon Signed Rank test (𝑾# ) (For paired samples)
• Test statistic: 𝑊 # = 𝑠𝑢𝑚 𝑜𝑓 𝑟𝑎𝑛𝑘𝑠 𝑓𝑜𝑟 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠
• For situations in which the sample size is minimal 5 for the paired samples [𝑛 𝑝𝑎𝑖𝑟𝑒𝑑 ≥ 5]
o à Normal approximation 𝑊 # = 𝑁(𝜇$ ! +, 𝜎$ ! )
$ ! %&"!
• 𝑧= à into z-score
'"!
Method:
1. Calculate the differences for all the pairs.
2. Order (rank) the differences in absolute values (Absolute value means that we only consider the
distance to zero. So |-4.3| = 4.3) For ties use the average rank. (see example)
3. Test statistic 𝑊 # = 𝑠𝑢𝑚 𝑜𝑓 𝑟𝑎𝑛𝑘𝑠 𝑓𝑜𝑟 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠 - Differences equal to zero can be excluded.
4. 𝐻( : 𝑀𝑒𝑑𝑖𝑎𝑛 𝑔𝑟𝑜𝑢𝑝 1 = 𝑀𝑒𝑑𝑖𝑎𝑛 𝑔𝑟𝑜𝑢𝑝 2 à 𝐻( : 𝑀𝐸! = 𝑀𝐸"
(𝐻( : 𝑀𝑒𝑑𝑖𝑎𝑛 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠 = 𝑀𝑒𝑑𝑖𝑎𝑛 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠)
5. If the null hypothesis is true, the test statistic 𝑊 # has mean 𝜇$ ! and standard deviation 𝜎$ !
For situations in which the sample size is minimal 5 for the paired samples [𝑛 𝑝𝑎𝑖𝑟𝑒𝑑 ≥ 5]
à Normal approximation 𝑊 # = 𝑁(𝜇$! +, 𝜎$! )
à 𝑊 # has expectation: 𝜇$! = )()#!)
,
and standard deviation 𝜎$ ! =L
)()#!)(")#!)
",
𝑊+ −𝜇𝑊+
à𝑧= 𝜎𝑊+
N(0,1)-distribution
Example Wilcoxon Signed Rank test (𝑾# ) Retelling two fairytales (n =10)
# Pre-school children (n =10): They are asked to retell two fairytales that were read aloud to them earlier in the week.
# Each child is told two stories: story 1: only read aloud and story 2: read aloud + illustrated with pictures.
# Observations: The retelling of the children was recorded. An expert assigned a score to each child for each story.
1. Rethink the problem: Paired samples situation à Wilcoxon Signed Rank test (𝑊 # )
(possible argument: problems with the distribution, given the sample size)
2. Formulate 𝐻( and 𝐻1 and define 𝛼
𝐻( : 𝑀𝑒𝑑𝑖𝑎𝑛 𝑆𝑡𝑜𝑟𝑦! = 𝑀𝑒𝑑𝑖𝑎𝑛 𝑆𝑡𝑜𝑟𝑦" (𝐻( : 𝑀𝑆! = 𝑀𝑆" )
𝐻1 : 𝑀𝑒𝑑𝑖𝑎𝑛 𝑆𝑡𝑜𝑟𝑦! > 𝑀𝑒𝑑𝑖𝑎𝑛 𝑆𝑡𝑜𝑟𝑦" (𝐻1 : 𝑀𝑆! = 𝑀𝑆" )
$ ! %&"!
3. Give the test statistic plus the distribution of the test statistic: 𝑧 = '"!
N(0,1)-distribution
4. Calculate the test statistic, and carry out the test (using the P-value or the critical value)
Under 𝐻( : 𝑊 # has mean 𝜇$ ! and standard deviation 𝜎$ !, 𝑛 𝑝𝑎𝑖𝑟𝑒𝑑 ≥ 5 à normal approximation
)()#!) !( (!(#!) )()#!)(")#!) !((!(#!)("×!(#!)
𝜇$ ! = ,
= ,
= 27.5 and 𝜎$ ! = L ",
=L ",
= 9.811
𝑊+ −𝜇𝑊+ 38−27.5
𝑧= = = 1.071 à Table A: P = 0.1423
𝜎𝑊+ 9.811
5. Draw a conclusion (short formal report plus interpretation)
(Wilcoxon W+ = 38, N = 10, z = 1.071, (right sided), P =0.1423)
Based on this research we do not have enough evidence to reject the null hypothesis. We did not find enough
evidence to say that story 1 without pictures leads to a lower score than story 2 with pictures.
1. Calculate the differences for all the pairs.
2. Rank the differences in absolute values.
(Rule for double observations (tie):
average ranking of that group (7,5))
3. W+ = sum of ranks for positive differences
4. Fill in and calculate the test statistics
If you sum up all the ranks for a sample
of 10 the maximum is 55. 55/2 = 27.5
If there is no difference between story 1
and story 2 it will be near 27.5.
5
,Non-parametric test (distribution free tests) 2:
Wilcoxon Rank Sum Test (𝑾) (for two independent samples)
• Test statistic 𝑊 = 𝑠𝑢𝑚 𝑜𝑓 𝑟𝑎𝑛𝑘𝑠 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑔𝑟𝑜𝑢𝑝
• For situations in which the sample size is minimal 5 for both samples [𝑛! ≥ 5 𝑒𝑛 𝑛" ≥ 5]
o à Normal Approximation: 𝑊 = 𝑁(𝜇$ , 𝜎$ )
$%&"
• 𝑧=
'"
à into z-score
Two independent samples: à Wilcoxon Rank Sum Test
Arguments could be no normal distribution and measurement level of the (dependent) variable is ordinal.
Method:
1. Rank the combined data for two samples from smallest to largest.
2. For ties use the average rank.
3. Calculate the test statistic 𝑊. 𝑊 = 𝑇ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑟𝑎𝑛𝑘𝑠 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑔𝑟𝑜𝑢𝑝.
4. 𝐻( : 𝑀𝑒𝑑𝑖𝑎𝑛 1 = 𝑀𝑒𝑑𝑖𝑎𝑛 2 à 𝐻( : 𝑀𝐸! = 𝑀𝐸"
5. If the null hypotheses is true, the test statistic W has mean 𝜇$ and standard deviation 𝜎$
For situations in which the sample size is minimal 5 for both samples [𝑛! ≥ 5 𝑒𝑛 𝑛" ≥ 5]
à Normal approximation 𝑊 = 𝑁(𝜇$ +, 𝜎$ )
à 𝑊 has expectation: 𝜇$ = )%(=#!)
"
and standard deviation 𝜎$ =L
)% )& (=#!)
!"
𝑊−𝜇𝑊
à𝑧= 𝜎𝑊
𝑁(0,1) − 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
Example Wilcoxon Rank Sum Test (W) From Chapter 22: Buying a camera from a friend or stranger
# See now first your study book for a detailed answer of the two-sample t-test for the difference between means for
this data # It seems that the sample size is small, and the distributions of both groups are not equal.
𝐻( : 𝑀𝐸 𝐹𝑟𝑖𝑒𝑛𝑑𝑠! = 𝑀𝐸 𝑆𝑡𝑟𝑎𝑛𝑔𝑒𝑟𝑠" and 𝐻1 : 𝑀𝐸 𝐹𝑟𝑖𝑒𝑛𝑑𝑠! > 𝑀𝐸 𝑆𝑡𝑟𝑎𝑛𝑔𝑒𝑟𝑠"
Now we will add up all the ranks of the Friend’s group: 𝑊 = 7 + 8.5 + 10.5 + 10.5 + 12 + 14 + 14 + 14 = 90.5
)%(=#!) >(!?#!)
The mean of W when the null hypothesis is true is: 𝜇$ = "
= "
= 64
)% )& (=#!) > × @(!? # !)
SD is 𝑆𝐷(𝑊) = j𝑉𝑎𝑟(𝑊) = 𝜎$ = L !"
=L !"
= 8.64
𝑊−𝜇𝑊 90.5−64
So 𝑧 = = = 3.07
𝜎𝑊 8.64
With a two-sided P-value of 0.0021. As with both other tests we looked at, this P-value is between 0.01 and 0.001.
We reach the same conclusion as we did with the two-sample t-test. Of course, the rank sum test has the advantage
that it doesn’t depend on the Nearly Normal Condition. And it will be less powerful than the two-sample t-test when
that condition is satisfied because it doesn’t use all the information in the data.
(Rank sum test Wilcoxon, W = 90.5, n1 =7, n2 =8, z =3.07 (two sided), P =0.0021)
* Almost certain that there is a question about either Wilcoxon Sign Rank test (= Related/ paired samples) or
Wilcoxon Rank Sum test (= Two independent samples) on exam 2!
6
,Lecture 9: Regression: Assumptions and Using the model
Assumptions and conditions
Assumptions about the data for statistical inference via regression. Statistical inferences = assumptions about the
data: 𝑦 = 𝑏( + 𝑏! 𝑥. These model assumptions can be summarized into conditions that can be checked via
residual analysis (do we have a representative sample?). The residuals (𝜀D ) must be independent and normal
distributed: 𝑁(0, 𝜎)and homoscedasticity for the spread around the regression line.
Assumptions linear model à investigating via residual analysis Examples of residual plots and scatter plots
1. Linear relationship assumption à Straight Enough Condition. Residual plots Scatter plots
• Check via a scatterplot (shape must be linear, or we can’t
use regression at all)
2. Independence assumption à Randomization Condition
• Check via research design (research methodology) or
residual plot: residuals randomly scattered (on the
independent variable / or predicted) à If a pattern failure
of independency?
3. Equal variance assumption à Homoscedasticity
(“Does the Plot Thicken? Condition”)
• Check via residual plot on the values of the independent
Residual Analysis:
variable(s) or the predicted variable (same plots as for 2).
This analysis seems to be a little bit
The spread of the residuals should be uniform.
overdone for a “simple regression”. But
you need this type of analysis for multiple
4. Normal population assumption à Nearly Normal Condition &
regression. (It’s relatively easy to
Outlier Condition
understand it, via simple regression.)
• Check via a histogram of the residuals (should be unimodal,
symmetric) and check for outliers.
Example residual analysis: SAT-score
Is there a positive linear relation between the Verbal score and the Math score? (n =162)
q
Hypotheses: 𝐻( : 𝛽! = 0 and 𝐻( : 𝛽! > 0 Linear model: 𝑀𝑎𝑡ℎ − 𝑆𝐴𝑇 = 209.6 + 675 (𝑉𝑒𝑟𝑏𝑎𝑙 − 𝑆𝐴𝑇) with S =71.75
800,00 ∑(F' %FG' )&
Standard deviation of the residuals: 𝑠 = L )%"
I H% (.O@?
700,00 𝑆𝐸H ! = and 𝑡 = MN = (.(?@ = 11.84 df= 162-2 = 160
J∑(K' % K̅ )& (%
600,00
MathSAT
Coefficientsa
500,00
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 209,554 34,349 6,101 ,000
400,00
VerbalSAT ,675 ,057 ,685 11,880 ,000
R Sq Linear = 0,469
a. Dependent Variable: MathSAT
300,00 Short report: (b1 = .675, t(160) = 11.88, p < .001)
300,00 400,00 500,00 600,00 700,00 800,00
VerbalSAT
P < .001, so we can conclude (with α = 5%) that there is a positive
linear relation between the Verbal-score and the Math-score.
Model Summary
For the linear relation between the Verbal score and the Math score
Adjusted Std. Error of
Model R R Square R Square the Estimate we found: Math q− SAT = 209.6 + 675 (Verbal − SAT) Assumptions?
1 ,685a ,469 ,465 71,75461
à residual analysis in SPSS.
a. Predictors: (Constant), VerbalSAT
For multiple regression a more general method can be used à via the predicted variable
7
,Confidence intervals and prediction intervals for predicted values for x*
We have calculated the confidence interval for the slope 𝛽! : CI via 𝑏! ± 𝑡 ∗ 𝑆𝐸H ! with 𝑑𝑓 = (𝑛 − 2) 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 𝑡 ∗ 𝑆𝐸
The sampling distribution of a predicted (estimated) value for 𝑦• via regression has also a standard deviation (or
standard error).
Two types of questions (examples):
1. Predict the mean Math-score for all students with a Verbal-score 400? (confidence interval)
2. Predict the Math-score for a specific student with a Verbal score 400? (prediction interval)
• Outcome for both questions is exactly the same, but it is still important to make a distinction because
there is a difference in the confidence intervals. For one person is more difficult, general is more reliable.
• We start with the same prediction in both cases. For a certain value of X, we call this x-value x*
• The regression predicts: 𝑦• = 𝑏( + 𝑏! 𝑥 ∗
• Both intervals take the form: 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 𝑡 ∗ 𝑆𝐸 with 𝑑𝑓 = (𝑛 − 2)
The standard error of the mean predicted value for x is (calculation only via SPSS and not by hand):
! (K ∗ %K̅ )&
• Standard error for mean response for 𝑥 ∗ : 𝑆𝐸 = 𝑆L) + ∑(K %K̅ )&
'
• And confidence interval via: 𝑢•F ± 𝑡 ∗ 𝑆𝐸&Q 𝑤𝑖𝑡ℎ 𝑑𝑓 = (𝑛 − 2)
Individuals vary more than means. So, the standard error for a single predicted value is larger than the standard
error for the mean (calculation only via SPSS and not by hand):
! (K ∗ %K̅ )&
• Standard error for an individual for 𝑥 ∗ : 𝑆𝐸FG = 𝑆L1 + + ∑(K &
) ' %K̅ )
∗
• And predicted interval via 𝑦• ± 𝑡 𝑆𝐸FG 𝑤𝑖𝑡ℎ 𝑑𝑓 = (𝑛 − 2)
Example Confidence intervals versus prediction intervals for predicted values
Using the model for an individual case is mostly not valid, using it for groups in general can be valid.
Confidence interval mean response via: 𝑢•F ± 𝑡 ∗ 𝑆𝐸&Q 𝑤𝑖𝑡ℎ 𝑑𝑓 = (𝑛 − 2)
Prediction interval individual reaction via: 𝑦• ± 𝑡 ∗ 𝑆𝐸FG 𝑤𝑖𝑡ℎ 𝑑𝑓 = (𝑛 − 2)
8
, Lecture 10: Introduction to Multiple Regression
Multiple (linear) regression
Multiple linear regression (in theory): a regression model with two or more (𝑘) predictor variables:
• 𝑦D = 𝛽( + 𝛽! 𝑥D ! + 𝛽" 𝑥D " +. . . +𝛽R 𝑥D R + 𝜀D (calculation not by hand)
Sample data so, least squares method for estimating the regression line:
• 𝑦• = 𝑏( + 𝑏! 𝑥! + 𝑏" 𝑥" +… (calculation not by hand)
∑(F% %FG% )&
• With the estimate 𝑠 " for 𝜎 " : 𝑠 " = (=MSE) (calculation not by hand)
)%R%!
Hypotheses: 𝐻( : 𝛽! = 𝛽" =. . . = 𝛽R = 0 and 𝐻1 : 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝛽R 𝑖𝑠 𝑛𝑜𝑡 0
H
T-test statistic for 𝛽S : 𝑡 = MN* 𝑤𝑖𝑡ℎ 𝑑𝑓 = (𝑛 − 𝑘 − 1)
(*
CI for 𝛽S via: 𝑏S ± 𝑡 ∗ 𝑆𝐸H* 𝑤𝑖𝑡ℎ 𝑑𝑓 = (𝑛 − 𝑘 − 1) (Give CI + if all other variables are in the model)
Example 1 Multiple regression: Results for the exam Statistics II part 1
It is expected there is a positive effect from Assignments and Attendance on the exam. Exam statistics: 64% sufficient.
First concentrate on effect Assignments. It is expected that there is a positive effect from Assignments on the exam.
𝐻( : 𝛽! = 𝛽" = 𝛽T = 𝛽, = 0 and 𝐻1 : 𝑠𝑜𝑚𝑒 𝛽RU 𝑠 𝑎𝑟𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 (> 0)
F-test (ANOVA) (F = 23.803, df=4, 103, P<.01) à Conclusion?
General outcome for the test à model is significant
Spread around the four
Dimensional regression line
S = 15.569
More interested in the details à Multiple regression equation:
The model: 𝑦• = 14.741 + 0.711𝑥! + 1.352𝑥" + 0.508𝑥T + 0.134𝑥,
Some are significant and some are not, but there is a strong relationship between the different assignments. Can we
make the equation simpler? Must we bring in all these independent variables separated, or can we skip a few or
maybe can we combine some? à look at model criteria. For now, we work with all independent variables.
Recode/ Compute into one variable Assignments = Ass_A_20 + Ass_B_25 + Ass_C_25 + Ass_D_30.
There seems to be a clear positive relationship between the
mark for the exam and the assignments (r = .66, n =108, P<.001).
It can be stated, based on the (simple) regression of Grade for
the Exam on Results Assignments, that 44% of the variation in
the Exam can be “explained” by the results for the assignments.
9
, We can look at model criteria to see if an equation can be simpler:
• Significant effects
• R2 change (Adjusted R Square most of the time more save)
• Few predictors / simple models (parsimonious models) à preferable to make it simple
• Total explained variance (small s)
• Unrelated predictors/ no (multi) collinearity
What does it mean in multiple regressing analysis if a specific regression coefficient is not significant?
It does not mean that the specific predictor variable (variable X) has no linear relationship with the dependent
variable (variable Y). It means that the specific predictor variable (variable X) contributes (almost) nothing to the
modelling of Y (the dependent variable), after also all other predictor variables are taken in account.
Example 1 Multiple regression: Results for the exam Statistics II part 2
It is expected there is a positive effect from Assignments and Attendance on the exam. Exam statistics: 64% sufficient.
𝐻( : 𝛽1II = 𝛽1VV = 0 and 𝐻1 : 𝛽1II > 0, 𝛽1VV > 0
The model: 𝑦• = 8.963 + 0.1.953𝑥! + −0.520𝑥"
𝑆 = 15.157
Example 2 Multiple regression: Regression of Height of Students on Height of Parents part 1
Regression of Height of Students (Q14) on Height of Mothers (Q15): 1 (simple regression)
y• = 64.64 + 0.672x! and 𝑆 = 9.013
Coefficientsa
Model Summary
Unstandardized Standardized
Adjusted Std. Error of Coefficients Coefficients
Model R R Square R Square the Estimate Model B Std. Error Beta t Sig.
1 (Constant)
1 ,434a ,188 ,183 9,013 64,641 19,224 3,362 ,001
Q15_lengthmum ,672 ,114 ,434 5,874 ,000
a. Predictors: (Constant), Q15_lengthmum a. Dependent Variable: Q14_length
Regression of Height of Students (Q14) on Height of Mothers (Q15) and height of Father: 2 (multiple regression)
y• = 44.54 + 0.577x! + 0.199x" and 𝑆 = 8.971
Coefficientsa
Model Summary
Unstandardized Standardized
Adjusted Std. Error of Coefficients Coefficients
Model R R Square R Square the Estimate Model B Std. Error Beta t Sig.
1 (Constant)
1 ,453a ,205 ,194 8,971 44,544 22,221 2,005 ,047
Q15_lengthmum ,577 ,125 ,373 4,602 ,000
a. Predictors: (Constant), Q16_lengthdad, Q15_ Q16_lengthdad ,199 ,111 ,145 1,791 ,075
lengthmum a. Dependent Variable: Q14_length
10