MAST20034 · Critical Thinking With Data
Observational Studies and Confounding
Weeks 4–5 are the other highest-value block, and the home of the subject's signature exam move: find the confounder. When you can only watch, you must know exactly what watching lets you say. The chapter lays out the four observational designs — cohort, case-control, cross-sectional, ecological — classified by what you group on (exposure, outcome, a snapshot, or a population), each with its own strengths and traps (recall bias in case-control, the ecological fallacy in ecological). The heart of it is confounding: why association is not causation, drawn as the confounding triangle (a third variable Z driving both X and Y), and the four levers to fight it — restriction, matching, stratification and adjustment. You then separate bias from precision on the dartboard image (bias = systematic, off-centre, unfixable by size; imprecision = scatter, shrinks with size), and close with the critique skeleton for assessing any data-based claim. Master the confounder hunt and you have the most reusable answer in the exam.
What this chapter covers
- 014.1 Observational vs experimental — the dividing line, revisited
- 024.2 The four observational designs — cohort / case-control / cross-sectional / ecological
- 034.3 The confounding triangle (a third variable drives both exposure and outcome)
- 044.4 Controlling confounding — restriction, matching, stratification, adjustment
- 054.5 Bias vs precision — the dartboard
- 064.6 Assessing a data-based claim — the critique skeleton
Bias vs precision — why a bigger sample won't help, mark by mark
- +1Name the concepts: distinguish bias (systematic error, the estimate centred on the wrong value) from precision (random scatter, which shrinks as n grows).
- +1Identify the bias: a self-selected online poll suffers voluntary-response / selection bias — people with strong opinions and site-visitors are over-represented, so the sample is centred away from the population.
- +1Explain why size can't fix it: increasing n to 50,000 only tightens a biased estimate — it shrinks the random scatter around the wrong centre, giving a precise but inaccurate number.
- +1State the fix: accuracy needs a probability sample of the target population (random selection, follow-up of non-responders), not a larger convenience sample.
Key terms
- Cohort study
- Group subjects by exposure and follow forward to the outcome. Good for incidence and rare exposures, and respects time order; expensive and slow, and confounding still threatens causal reads.
- Case-control study
- Start from the outcome (cases vs controls) and look back at exposure. Efficient for rare diseases, but prone to recall and selection bias and cannot give incidence directly.
- Confounding triangle
- The diagram of a confounder Z that causes both the exposure X and the outcome Y, generating a non-causal X–Y association. Naming Z and drawing the triangle is the signature exam answer.
- Ecological fallacy
- Inferring about individuals from group-level (aggregate) data. An association seen across populations need not hold within them — the classic trap of ecological studies.
- Bias vs precision
- Bias is systematic error (wrong centre, unfixable by sample size); precision is the inverse of random scatter (tightens as n grows). The dartboard image: accurate = unbiased AND precise.
Observational Studies and Confounding FAQ
How do I spot a confounder fast?
Ask for a third variable that plausibly causes BOTH the exposure and the outcome. If removing the exposure wouldn't change the outcome because the real driver is elsewhere, you've found it — name it and draw the Z→X, Z→Y, X–Y triangle.
What are the four ways to control confounding?
Restriction (study only one level of the confounder), matching (pair on it), stratification (analyse within bands of it) and statistical adjustment (model it). Each removes the confounder's influence; randomisation, available only to experiments, handles even unknown ones.
Why won't a larger sample fix bias?
Because bias shifts the centre of the estimate, and sample size only reduces the scatter around that centre. A bigger biased sample is a more precisely wrong answer — accuracy needs an unbiased design, not more data.
Exam move
This block carries the most exam weight — over-rehearse the confounder hunt until naming the third variable and drawing the triangle is automatic on any association prompt. Put the four observational designs (with one strength + one trap each) and the four control levers on your notes sheet as tables. Drill the bias-vs-precision one-liner for every ‘big sample = accurate’ question. Finish every claim-critique with the skeleton: name the design → state the legal conclusion → name a confounder/bias → give the fix.