Wednesday, 11 November 2020
Statistics
Chi-square
Outline:
1. Goodness-of-fit (GOF),
2. Test of Contingency,
3. Effect size (how different the observed and expected values are?) and analysis of residuals
(where do the significant associations lie?),
4. Step-by-step guide.
What is chi-square:
• A test that is able to analyse frequencies,
• It deals with categorical variables (gender, sex) and sometimes ordinal.
Types of chi-square tests:
1. Goodness-of-fit
- Used when you would like to ask if what you observe ‘fits’ with what you expected to see —
concerned with unidimensional data,
- E.g. are as many people right-handed as we expect?
2. Test of contingency
- Used when you would like to ask if one variable is associated with another (or ore).
Concerned with multi-dimensional data,
- E.g. is handedness associated with creativity?
Assumptions for a goodness-of-fit test:
1. Categories should be exhaustive and mutually exclusive,
2. Observations must be independent — data must come from different participants.
Assumptions for test of contingency:
1. Categories should be exhaustive and mutually exclusive,
1
, Wednesday, 11 November 2020
2. Observations must be independent — data must come from different participants,
3. You must have at least 20 data points and a minimum of 5 observations per cell for
expected values — you need a sufficiently big sample size.
Important to note:
- P-values are not enough to tell you about significant — you need to look at it in conduction
with the effect size (Cramer’s V),
- P-values cannot tell you where the association is — you need to do this through analysis of
residuals.
What is a residual?
- It is the deviation of the observed from the expected frequency,
- Residual = O - E.
However, the size of the deviation is related to the size of the sample. Therefore, cells with
larger E values have larger residuals. To account for this, we standardise the residuals.
Standardised residuals indicate each cell’s relative contribution to the significant of the chi-
square value.
Following the above, we need to calculate the adjusted residual — this is because it
underestimates the size of variance.
- e = standard residual
- nrow = row total
- ntotal = grand total
- ncol = column total
2