MAST20034 · Critical Thinking With Data
Accumulating Research
Week 10 zooms out from one study to the weight of evidence: no single study settles a question, so this chapter is about how science accumulates and how to judge a body of work. It starts with peer review as the first quality filter (and its limits), then sets out the synthesis methods — narrative, systematic and meta-analysis — and what each adds, including how to read a forest plot (each study's effect and CI, the pooled estimate, heterogeneity). The chapter then confronts the reproducibility crisis and its named mechanisms — p-hacking, HARKing, publication bias and the garden of forking paths — and the fixes (preregistration, registered reports). Its flagship is the Bradford Hill criteria for arguing causation from observational evidence (strength, consistency, temporality, dose–response, plausibility and the rest) — the structured way to make a causal case when you cannot run an experiment. Exam prompts ask you to weigh evidence, read a forest plot, spot a reproducibility failure, or build a Hill-style causal argument.
What this chapter covers
- 0110.1 One study vs the weight of evidence
- 0210.2 Peer review — the first filter, and its limits
- 0310.3 Meta-analysis vs systematic vs narrative review
- 0410.4 Reading a forest plot
- 0510.5 The reproducibility crisis — p-hacking, HARKing, publication bias
- 0610.6 The Bradford Hill criteria — arguing causation
Arguing causation with Bradford Hill, mark by mark
- +1Frame the move: with no experiment possible, use the Bradford Hill criteria to build a structured causal argument from observational evidence (they guide judgement; no single one proves causation).
- +1Criterion 1 — temporality: exposure must precede the cancer; show pesticide exposure came before diagnosis (the one near-essential criterion).
- +1Criterion 2 — dose–response: more exposure → more disease; a monotonic gradient strengthens the causal read over a flat association.
- +1Criterion 3 — consistency (+ a fix): the same link recurs across different populations, designs and labs; pair it with biological plausibility to round out the case, while conceding confounding can never be fully excluded without an experiment.
Key terms
- Meta-analysis
- A quantitative synthesis that statistically pools effect estimates from multiple studies into one overall estimate (with a CI), usually displayed as a forest plot. More precise than any single study, but only as good as the studies and the publication record behind it.
- Systematic review
- A review that follows a pre-specified, reproducible protocol (e.g. PRISMA) to find, appraise and summarise all relevant studies — minimising the cherry-picking that weakens a narrative review. May or may not include a meta-analysis.
- Forest plot
- The standard meta-analysis graphic: each study a point estimate with a CI, sized by weight, plus a pooled diamond. You read direction, precision, heterogeneity (do CIs overlap?) and whether the pooled CI crosses the no-effect line.
- Publication bias
- The tendency for positive, significant results to be published more than null ones, so the literature overstates effects. The ‘file-drawer problem’; funnel-plot asymmetry is a warning sign in a meta-analysis.
- Bradford Hill criteria
- A set of viewpoints (strength, consistency, temporality, dose–response, plausibility, coherence, specificity, analogy, experiment) for judging whether an observed association is causal. Guides, not a checklist — temporality is the closest to essential.
Accumulating Research FAQ
Why isn't one study enough?
Because any single study can be a fluke, a biased sample or a false positive. Confidence comes from replication and synthesis — consistent results across independent studies, pooled where appropriate, weighed for quality and publication bias.
How do I read a forest plot?
Read each study's point and CI (effect and precision), check whether CIs overlap (heterogeneity), and look at the pooled diamond — its position relative to the no-effect line and whether its CI crosses it tells you the overall effect and its significance.
What causes the reproducibility crisis?
Practices like p-hacking (trying analyses until something is significant), HARKing (hypothesising after results are known), the garden of forking paths, and publication bias. Fixes include preregistration, registered reports and replication.
Exam move
Memorise the Bradford Hill criteria — at least temporality, dose–response, consistency, strength and plausibility — and rehearse building a causal argument from them, leading with temporality. Keep a tidy table separating narrative / systematic / meta-analysis (what each adds) on your notes sheet, plus a one-line forest-plot reading guide. Learn the reproducibility failures by name (p-hacking, HARKing, publication bias) so you can label them, and pair each with its fix (preregistration). The recurring exam ask is ‘weigh this body of evidence’ — answer with synthesis logic, not a single study.