PUBH5010 · Epidemiology Methods And Uses
Selection Bias
Selection bias arises when the people you actually analysed are not a fair window on the study base — the source population the question is really about — so the association you measure is distorted before any calculation. It enters through how people get into, or stay in, the study: choosing controls who differ systematically from the population that produced the cases; the healthy-worker effect, where employed people are healthier than the general population used as a comparison; loss to follow-up that is related to both exposure and outcome; volunteer and referral effects; and survivor effects in cross-sectional or prevalence data. The damage is specific, not vague: depending on the pattern, selection bias can push the measure of association toward or away from the null, or even reverse it, and — unlike confounding — you usually cannot fix it in the analysis, because the information you would need was never sampled. The exam test is to name where in the selection process the distortion entered and argue which way it likely moved the estimate, then say what design or sampling choice would have prevented it.
What this chapter covers
- 01What selection bias is: the analysed sample vs the study base
- 02Selection of inappropriate controls in case-control studies
- 03The healthy-worker effect
- 04Loss to follow-up related to exposure and outcome
- 05Volunteer, referral and survivor effects
- 06Predicting the direction of the distortion
- 07Why selection bias usually can't be fixed in the analysis
Worked example: differential loss to follow-up in a cohort
- +1(a) Name it. Outcome-related dropout that differs by exposure is selection bias from differential loss to follow-up.
- +2(b) Direction. Cases are selectively removed from the exposed arm, so the exposed group looks healthier than it is. The true RR is understated — the measure is pulled toward the null (here, masking a real effect as RR = 1.0).
- +1(c) Fixable? No. The outcomes of the workers who left were never observed, so no adjustment can recover them. The remedy is prevention by design — minimise and track dropout, and compare characteristics of those lost vs retained.
Key terms
- Selection bias
- A systematic error arising when the people included in (or retained by) a study differ from the study base in a way related to both exposure and outcome, distorting the measure of association. Unlike confounding, it usually cannot be corrected in the analysis.
- Study base
- The source population and time window that generates the cases. Selection bias is fundamentally a mismatch between the analysed sample and this base — most often a control group that does not represent the population the cases came from.
- Healthy-worker effect
- A selection bias in occupational studies: employed people are on average healthier than the general population, so using the general population as the comparison understates an occupational hazard. The fix is an internal comparison group within the workforce.
- Loss to follow-up
- Participants leaving a cohort before the outcome is observed. It biases the result only when the loss is related to both exposure and outcome (differential loss); random loss reduces precision but not validity.
- Survivor / prevalence bias
- In cross-sectional or prevalence-based studies, only those who survived with the disease long enough to be sampled are captured, so determinants of survival get confused with determinants of disease.
Selection Bias FAQ
How is selection bias different from confounding?
Selection bias comes from how people entered or stayed in the study — the analysed sample is not a fair window on the study base. Confounding comes from a third variable mixed into a correctly-sampled comparison. The practical difference is fixability: confounding can often be adjusted for in the analysis if you measured the confounder; selection bias usually cannot, because the needed information was never sampled.
Which way does selection bias push the estimate?
It depends on the pattern — that is the whole exam skill. It can bias toward the null (e.g. cases selectively lost from the exposed arm masking a real effect), away from the null (e.g. controls chosen to be unusually unexposed inflating an OR), or even reverse the direction. You argue the direction from who was selectively included or excluded and how that group sits on exposure and outcome.
What is the healthy-worker effect?
Working people are healthier than the general population because illness keeps people out of work, so comparing a workforce to the general population understates occupational risk — the exposed group is artificially healthy. The standard remedy is an internal comparison: contrast more-exposed with less-exposed workers, not workers with the public.
Can you adjust for selection bias afterwards?
Generally no. The information you would need — the outcomes of people who were never sampled or who dropped out — simply was not collected, so there is nothing to adjust. The defence is in the design: representative control selection, internal comparison groups, and minimising and characterising loss to follow-up. This non-fixability is exactly why examiners stress getting selection right up front.
Exam move
Answer every selection-bias prompt in three beats: (1) where in the selection or retention process the distortion entered, (2) which direction it likely moved the estimate (toward, away from, or across the null), and (3) why the analysis usually cannot undo it. Memorise the named patterns — inappropriate controls, healthy-worker effect, differential loss to follow-up, volunteer/referral/survivor effects — and for each, the design choice that prevents it. The contrast with confounding (fixable vs not) is a favourite discriminating question.