Prediction models in health science
Predictors should be measured as it’s done in clinical practice. However, outcomes should
be measured as accurate as possible, which is not necessarily as it’s measured in practice.
In prognostic research, the interest is more in the cumulative risk over time than in the
instantaneous risk (hazard). The cumulative risk can be transformed into an absolute risk,
which is most useful for patient management.
Subgroups formed after randomization (according to treatment compliance) are unlikely to
be a random subset of the study and therefore, analyses of the treatment effect in
subgroups formed after randomization are likely to be biased, whereas subgroups formed
before randomization (based on disease severity) are not. This is because the factor defining
the subgroup (high compliance) is often associated with treatment/prognosis (measure of
effect). So, if you stratify on compliance after randomization, patients with severe disease
will be in the group with high compliance and patients with not so severe disease in the low
compliance group. Therefore, randomization is eliminated because the subgroups formed
before randomization will be together without you knowing it if you randomize on the effect.
Prediction consists of diagnostic (what’s wrong at this moment) and prognostic models
(predict the future). Motivation for prediction in medicine:
- Patient’s interest: do I have the disease/prognosis?
- Doctor (clinical management): informing patient about diagnosis/prognosis, decisions
on diagnostic testing, selection of treatment based on diagnosis/prognosis/predicted
treatment effects/costs.
- Research: selection of patients for research
- Adjustment for initial prognosis in comparative research on hospital performance
- Prediction of the likelihood of receiving treatment (propensity score)
Prediction almost always involves a multivariable model, which combines multiple predictors.
The motive is to predict rather than to explain (etiology). Confounding is therefore not an
issue, because not 1 determinant is central. A ‘confounder’ could be a potential predictor.
However, effect modification is often relevant, because diagnostic/prognostic determinants
may have different effects in subgroups. Outcomes are expressed as absolute risks (or
means) according to predictor values (personalized risks).
A perfect risk model is a small model, easy to use, easily available (internet), good
generalizability, best combination of risk factors, precise prediction, includes time & most
important interactions, cheap. Decisions should also be based on the patient’s preferences
(shared decision making).
Calibration: Do the predicted probabilities of the model match the observed
probabilities? → Hosmer Lemeshow test in logistic regression (doesn’t take
overfitting into account), calibration plot in Cox regression The p-value of the HL
goodness-of-fit test signifies the probability of a similar/more extreme deviation from the
perfect calibration if we were to repeat the study in a similar sample, assuming that predicted
and observed risks are similar. For example, if the p-value is 0.06, it means that if we were
to repeat the study in a similar sample, we would find a similar or more extreme odds ratio in
6% of the times, assuming that in reality the OR=1. A p-value >0.05 indicates a better
, calibration, because it is assumed that H0 (no differences between observed and predicted
values) is true.
Discrimination: Is the model able to distinguish between high and low risk
patients? If you are faced with 2 persons, the model will indicate the person with
the highest risk correctly in % of the times → AUC in logistic regression, c-
statistics/box plot in Cox regression
Overall performance: Accuracy per patient → distance between predicted and
observed → ex plained variation (R²) determines the correspondence between
predicted and observed outcomes It’s used to compare 2 models
Sensitivity: Chance that an ill person gets a positive test result. If sensitivity is high, the test
is able to correctly detect the ill persons. So, if a screening test has a high sensitivity, you
can be confident it will detect the injury. If the test is negative, you can be nearly certain that
they don’t have the disease, because the chance of false-negatives is very low. So, if a test
is used to exclude a certain condition, it has a high sensitivity. Highly sensitive test is good at
ruling out the disease you are screening for (snout).
Specificity: Chance that a healthy person gets a negative test result. So if a test has a high
specificity and the result is positive, you can be nearly certain that they do have the disease
you screened for, because the chance of false-positives is low. If the test is negative, you
can say it’s not the problem you are screening for. Highly specific test is good at ruling in the
disease you are screening for (spin).
Parallel testing: Combined test is positive when test A or B or C is positive →
sensitivity↑, specificityy ositive test is enough to get a combined positive
end result, so the chance that an ill person will get a positive test increases
Specificity decreases, because all tests have to be negative to get a negative
end result
Sequential testing: Combined test is positive when test A and B and C are positive
→ sensitivityy, specificity↑ Sensitivity decreases, because the chance of finding
an ill person is small (because test A, B and C have to be positive) Specificity
increases, because only one test needs to be negative to get a negative end
result
Each test result weighs into the likelihood of disease being present:
- Positive likelihood ratio: ratio between the occurrence of a positive result if
someone is ill and the occurrence of a positive result in healthy people.
LR+ = (TP/D+) / (FP/D-) = sensitivity/(1-specificity).
- Negative likelihood ratio: ratio between the occurrence of a negative test if
someone is ill and the occurrence of a negative result in healthy people.
LR- = (FN/D+) / (TN/D) = (1-sensitivity)/specificity