PUBH5010 · Epidemiology Methods And Uses
Study Types and the Design Tree
This is the decision that opens almost every exam prompt: what kind of study is this? Epidemiological designs split first into descriptive (who, where, when — no comparison group) and analytic (a comparison that lets you test an exposure–outcome link), and the analytic branch then splits by who decided who got the exposure. In an experiment (the RCT) the investigator allocates the exposure, usually at random; in observational studies nature and circumstance allocate it, and you choose your sampling: cohort (group by exposure, follow forward to the outcome), case-control (group by outcome, look back at exposure), cross-sectional (measure both at one moment), or ecological (compare populations, not individuals). The design you name controls which measure of association you are allowed to compute and how strongly you can argue causation — so getting the design right is the first mark and the gate to the rest. A clean answer also checks the study base: a well-defined base means cases and the comparison group come from the same source population, which is exactly what selection bias threatens.
What this chapter covers
- 01Descriptive vs analytic: when there is a comparison group, and when there isn't
- 02Experimental vs observational: who allocated the exposure
- 03Cohort studies: group by exposure, follow forward to the outcome
- 04Case-control studies: group by outcome, look back at exposure
- 05Cross-sectional and ecological designs, and the ecological fallacy
- 06The study base, and why a well-defined base matters
- 07Naming the design from a one-line cue, and the measure it licenses
Worked example: name the design from the cue, and the measure it licenses
- +1(a) Cue → design. Subjects were selected on the outcome (cases who already have the cancer, plus controls without it), then exposure was asked about retrospectively. That is a case-control study.
- +2(b) Measure. Because sampling is on the outcome, you cannot read the risk of cancer in the exposed and unexposed directly, so RR/RD are off the table. The case-control design licenses the odds ratio (OR), ad/bc.
- +1(c) Threat. Recall bias (cases may remember past solvent exposure more keenly than controls) and selection of an inappropriate control group are the classic case-control vulnerabilities.
Key terms
- Cohort study
- An observational design that groups people by exposure status and follows them forward in time to see who develops the outcome. Because risk is observed directly, it licenses the risk ratio and risk difference; it suits common outcomes and lets you study several outcomes of one exposure.
- Case-control study
- An observational design that selects people on the outcome (cases vs controls) and looks back at past exposure. It is efficient for rare outcomes and licenses the odds ratio (risk cannot be read off because sampling is on the outcome), but is prone to recall and control-selection bias.
- Cross-sectional study
- A snapshot that measures exposure and outcome at a single point in time. It estimates prevalence and can show associations, but cannot establish whether exposure preceded outcome, so it is weak for causation.
- Ecological study
- A study whose unit of analysis is a population or group, not an individual — e.g. comparing average exposure and disease rates across countries. It is cheap and hypothesis-generating but risks the ecological fallacy: a group-level association need not hold for individuals.
- Study base
- The source population and time period that gives rise to the cases. A well-defined study base means the comparison group is drawn from the same population that produced the cases; a poorly defined base is the root of selection bias.
Study Types and the Design Tree FAQ
How do I tell a cohort from a case-control study?
Ask what defined the groups. If people were grouped by their exposure and then followed forward to see who got the outcome, it is a cohort (group by exposure, measure RR). If people were chosen because they already had the outcome (cases) or did not (controls), and exposure was asked about afterwards, it is a case-control (group by outcome, measure OR). The direction of sampling, not the timing of data collection, is the tell.
Why can't a case-control study give a risk ratio?
Because you fixed the number of cases and controls by design, the proportion with the outcome in your sample is an artefact of your sampling ratio, not the true risk in the population. So you cannot compute a risk or a risk ratio. The odds ratio survives this because the odds of exposure can be compared between cases and controls regardless of how many of each you sampled.
What is the ecological fallacy?
Inferring something about individuals from a relationship seen only at the group level. A country with higher average fat intake may have higher heart-disease rates, but that does not show the individuals eating more fat are the ones with disease — confounding and aggregation can manufacture or hide associations at the group level.
Which design is 'best'?
There is no universally best design — the right one depends on the question. RCTs give the strongest causal evidence but are not always ethical or feasible; cohorts suit common outcomes and multiple outcomes; case-control studies suit rare outcomes; cross-sectional and ecological designs are quick for prevalence and hypothesis generation. The exam rewards matching design to question and knowing each one's characteristic weakness.
Exam move
Build a reflex for the one-line cue: "selected on the outcome" → case-control → OR; "grouped by exposure, followed forward" → cohort → RR/RD; "measured both at once" → cross-sectional → prevalence; "populations, not individuals" → ecological → beware the fallacy; "investigator allocated, at random" → RCT. Section A almost always asks you to name several designs from short vignettes, so drill the cue→design→measure chain until it is automatic, and always pair the design with the measure it licenses and its signature weakness.