Week 1
Causality: Is there an effect? Phi Coefficient (φ): dichotomous + dichotomous
• Covariance: Variables have an association
• Directionality: Cause precedes effect (in time)
• Internal validity: Eliminate alternative explanations
Correlation (𝐴 × 𝐷 − 𝐵 × 𝐶)
𝜑=
Correlation: Is there an association between two √(𝐴 + 𝐵) × (𝐶 + 𝐷) × (𝐴 + 𝐶) × (𝐵 + 𝐷)
variables? 𝜑 = √𝜒 2 ÷ 𝑁 → 𝜒 2 = 𝜑2 × 𝑁
Scatterplots:
• Direction (positive/negative) Testing significance
• Strength (number of points on line) H0: ρ = 0, Ha: ρ ≠ 0, ρ < 0 or ρ > 0
r = r, 𝑟𝑠 , 𝑟𝑝𝑏 , or 𝜑
• Shape (linear/nonlinear, homo-/heterogenous)
• Outliers 𝑟√𝑁 − 2
𝑡=
√1 − 𝑟 2
𝑑𝑓 = 𝑁 − 2
APA citing: t(df) = …, P = … → t with 2 decimals and P
with 3. If P is smaller than 0.001, write P < 0.001
Calculating the Correlation
Covariance: The degree to which 2 variables covary, Effect size
but depends on the unit of measure. Statistical significance depends on N, r and α,
𝑆𝑥𝑦 = ∑((𝑥𝑖 − 𝑥̅ ) × (𝑦𝑖 − 𝑦̅)) ÷ (𝑁 − 1), therefore it’s important to calculate the effect size.
with 𝑥̅ = mean of 𝑥𝑖
The effect size r = r, 𝑟𝑠 , 𝑟𝑝𝑏 , or 𝜑, but they are hard to
Correlation coefficients, a standardized measure interpret.
Pearson r: quantitative + quantitative → linear 𝑟 2 is the coefficient of determination of variance
relationship between 2 variables between +1 and -1. accounted for.
𝑟 = 𝑆𝑥𝑦 ÷ 𝑆𝑥 𝑆𝑦 or The rule of thumb:
1 𝑥𝑖 − 𝑥̅ 𝑦𝑖 − 𝑦̅ ∑ 𝑍𝑥 𝑍𝑦
𝑟= × ∑( )×( )=
𝑁−1 𝑆𝑥 𝑆𝑦 𝑁−1
1. Write down x and y
2. Calculate 𝑍𝑥 and 𝑍𝑦 : 𝑍𝑥 = (𝑥𝑖 − 𝑥̅ ) ÷ 𝑆𝑥
3. For every row, do 𝑍𝑥 × 𝑍𝑦
4. Calculate r with 𝑟 = ∑ 𝑍𝑥 𝑍𝑦 ÷ (𝑁 − 1)
Spearman’s rho (𝑟𝑠 ): ordinal + ordinal → good for
outliers, as it is converted into ranks.
1. Write down x and y
2. Rank x and y in 𝑟𝑥 and 𝑟𝑦
3. Calculate 𝑍𝑟𝑥 and 𝑍𝑟𝑦 : 𝑍𝑟𝑥 = (𝑟𝑥 − 𝑥̅𝑟 ) ÷ 𝑆𝑥𝑟
4. For every row, do 𝑍𝑟𝑥 × 𝑍𝑟𝑦
5. Calculate r with 𝑟 = ∑ 𝑍𝑟𝑥 𝑍𝑟𝑦 ÷ (𝑁 − 1)
Point-biserial correlation (𝑟𝑝𝑏 ): dichotomous +
quantitative. Use same calculation as Pearson R.
𝑟𝑝𝑏 = √𝑡 2 ÷ (𝑡 2 + 𝑑𝑓), with 𝑑𝑓 = 𝑁 − 2
, Week 2
Regression enables you to predict one interval Standard error of the estimate
variable from one or more other variables. 1. Write down observed values: x and y
The predictor is the independent variable, x 2. Calculate predicted values: 𝑦̂ = 𝑏0 + 𝑏1 × 𝑥
The response/criterion is the dependent variable, y 3. Calculate error/residual: 𝑒𝑖 = 𝑦 − 𝑦̂ → best fit of
the data when minimized
Use of symbols 4. Calculate (𝑦 − 𝑦̂)2 = 𝑒𝑖 2
5. Calculate the sum: 𝑆𝑆𝑒 = ∑(𝑦 − 𝑦̂)2 = ∑ 𝑒𝑖 2
6. Calculate the mean squared error:
𝑀𝑆𝑒 = ∑(𝑦 − 𝑦̂)2 ÷ (𝑛 − 𝑝 − 1) = 𝑆𝑆𝑒 ÷ 𝑑𝑓𝑒
Simple linear regression
Regression model is based on a sample with a certain With P = number of predictors → In simple linear
range of x scores. regression P = 1
A simple linear regression: One predictor variable 7. Calculate the standard error of the estimate:
A multiple linear regression: 2+ predictor variables 𝑆𝑒 = √𝑀𝑆𝑒
Interpolation: make a prediction within the range
Extrapolation: make a prediction outside the range From sample to population
Statistical model
The regression equation always passes through 𝜇𝑦 = 𝛽0 + 𝛽1 × 𝑥, with variance 𝜎
(0, 𝑏0 ) and (𝑥̅ , 𝑦̅): 𝛽0 is estimated with 𝑏0
1. Calculate slope, the size of the difference in 𝑦̂ if x 𝛽1 is estimated with 𝑏1
increases by 1 unit: 𝑏1 = 𝑟 × (𝑠𝑦 ÷ 𝑠𝑥 ) 𝜎 is estimated with 𝑆𝑒
2. Calculate intercept/constant, the predicted value
of y when x = 0: 𝑏0 = 𝑦̅ − (𝑏1 × 𝑥̅ ) Significance testing
3. Calculate the prediction: 𝑦̂ = 𝑏0 + 𝑏1 × 𝑥 Hypothesis testing
H0: 𝛽1 = 0, Ha: 𝛽1 ≠ 0, 𝛽1 < 0 or 𝛽1 > 0
Standardized regression equation 𝑡 = 𝑏1 ÷ 𝑆𝐸𝑏1 , with 𝑑𝑓 = 𝑁 − 𝑃 − 1
Problem: When unit of measurement changes, 𝑆𝐸𝑏1 = 𝑆𝑒 ÷ (𝑆𝑥 × √𝑁 − 1)
regression equation also changes.
Solution: Use standardized regression equation Testing H0: 𝛽1 = 0 in simple regression = H0: ρ = 0
𝑍̂𝑦 = 𝑟 × 𝑧𝑥 Testing H0: 𝛽0 is done the same way, bus less useful.
Accuracy of prediction Confidence interval
Variance: Total = Model (explained) + error 𝑏1 ± 𝑡 ∗ × 𝑆𝐸𝑏1
(unexplained) 𝑡 ∗ is two-tailed critical t value with 𝑑𝑓 = 𝑁 − 𝑃 − 1
Sums of squares: 𝑆𝑆𝑦 = 𝑆𝑆𝑦̂ + 𝑆𝑆𝑒 = 𝑆𝑆𝑇 Use the website for the critical value of 𝑡 ∗, with
significance 0.025
Proportion explained variable is the part of the
variance that is systematic variance (VAF)
Total variance: 𝑆𝑆𝑦 = 𝑆𝑦2 × (𝑁 − 1)
𝑉𝐴𝐹 = 𝑆𝑆𝑚𝑜𝑑𝑒𝑙 ÷ 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 = 𝑆𝑆𝑦̂ ÷ 𝑆𝑆𝑦
𝑉𝐴𝐹 = 1 − 𝑆𝑆𝑒𝑟𝑟𝑜𝑟 ÷ 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 = 1 − (𝑆𝑆𝑒 ÷ 𝑆𝑆𝑦 )
For simple linear regression:
𝑉𝐴𝐹 = 𝑟 2