POPH90111 · Genetic Epidemiology
Mendelian Randomisation
Observational epidemiology is haunted by two ghosts: confounding (a third factor drives both exposure and outcome) and reverse causation (the disease changes the exposure, not the other way round). Mendelian randomisation (MR) exorcises both by swapping a modifiable exposure for a genetic variant that proxies it. Because alleles are dealt out randomly at conception — independent of the lifestyle and social factors that confound ordinary studies, and fixed long before disease onset — the genotype behaves like the randomisation arm of a trial you never ran: “nature’s randomised trial.” MR is an instrumental-variable method, and it lives or dies by three assumptions — relevance, independence and exclusion restriction — that you can argue for but, for two of them, never prove. The estimate is a simple ratio of two regressions, the Wald ratio βG→Y / βG→X; the #1 threat is horizontal pleiotropy; and the conclusion is always pitched on a continuum — X likely causes Y, never “proves.”
What this chapter covers
- 014.1 The problem MR solves: confounding & reverse causation
- 024.2 The instrumental-variable idea (G → X → Y, G acts on Y only through X)
- 034.3 The MR DAG and the three IV assumptions (Relevance, Independence, Exclusion)
- 044.4 The Wald (ratio) estimate βX→Y = βG→Y / βG→X; IVW for many SNPs
- 054.5 How MR avoids confounding and reverse causation
- 064.6 Threats to MR: horizontal pleiotropy (#1), weak instruments, stratification, canalisation
- 074.7 Appraising an MR study — a 6-step scaffold
Worked example: a Wald ratio and the weak-instrument trap
- +1Identify the two regressions. Numerator = gene→outcome = 0.10; denominator = gene→exposure = 0.40.
- +2(a) Divide (Wald ratio). βX→Y = βG→Y / βG→X = 0.10 / 0.40 = 0.25 log-odds per unit of exposure.
- +1(b) Odds ratio. OR ≈ e0.25 ≈ 1.28 per unit of exposure.
- +1(c) Check relevance first. Because you divide by βG→X, a weak instrument (small denominator, F < 10) inflates the estimate’s variance and biases it toward the confounded observational estimate — ‘the gene barely moves the exposure but strongly tracks the outcome’ is a red flag for pleiotropy, not a strong causal signal.
Key terms
- Instrumental variable (instrument)
- A variable that influences the outcome only by way of the exposure, letting you recover the causal exposure→outcome effect even under unmeasured confounding. In MR a genetic variant G is the instrument for a biological exposure X: G → X → Y, with G acting on Y only through X.
- The three IV assumptions (R-I-E)
- Relevance: G is robustly (strongly) associated with the exposure X — the only directly testable assumption (demand F > 10). Independence: G is independent of the X–Y confounders U (no G–U path) — plausible from random allocation at conception, but untestable. Exclusion restriction: G affects Y only through X, with no other pathway — broken by horizontal pleiotropy, and untestable. Only relevance can be proven.
- Wald ratio
- The MR causal estimate: βX→Y = βG→Y / βG→X, the gene–outcome regression divided by the gene–exposure regression, which rescales the genetic effect into ‘per unit of exposure’. Multiple SNPs are combined by inverse-variance weighting (IVW). A weak (small) denominator both inflates the variance and biases the estimate.
- Horizontal pleiotropy
- When the genetic instrument affects the outcome by a biological route other than the exposure, bypassing X — this violates the exclusion restriction and biases the estimate. It is the #1 threat to MR. (Vertical pleiotropy, G→X→…→Y all through the exposure, is harmless and is exactly what MR exploits.) Probe it with the MR-Egger intercept, the weighted median, and multiple independent SNPs.
- Canalisation
- Developmental compensation: an organism may buffer a lifelong genetic perturbation in ways that a short-term intervention would not, so MR estimates the effect of a lifelong average exposure, not the effect of changing the exposure for six months in an adult. It is an interpretation caveat rather than a DAG-path violation.
Mendelian Randomisation FAQ
How does a genetic variant escape confounding and reverse causation at once?
A germline variant is allocated at conception by meiotic chance, so it is essentially independent of the later lifestyle and social factors that confound ordinary studies (the independence assumption) — and it is fixed before any disease exists, so the disease cannot have caused it (no reverse causation). Random assignment of alleles plays the role that random assignment of treatment plays in a randomised controlled trial, which is why MR is called ‘nature’s randomised trial’.
Which of the three IV assumptions can you actually test?
Only relevance. You can regress the exposure on the genotype and demand a strong association — conventionally an F-statistic above 10 — to show the instrument is not weak. Independence (no G–U path) and exclusion restriction (no G→Y route other than through X) are argued from biology, never proven directly; sensitivity analyses only probe them. That asymmetry is why an MR conclusion is always ‘X likely causes Y’, pitched on a continuum from convincing to non-convincing.
What is the difference between horizontal and vertical pleiotropy?
Vertical pleiotropy is G→X→…→Y — the gene’s downstream effects all run through the exposure — and it is harmless, indeed it is exactly what MR exploits. Horizontal pleiotropy is when the gene reaches Y by a separate biological route that bypasses X; this violates the exclusion restriction and biases the estimate. When a question says ‘pleiotropy’, assume the dangerous horizontal kind and reach for MR-Egger or the weighted median.
Why is a small βG→X (weak instrument) dangerous rather than impressive?
Because the Wald ratio divides by βG→X. A weak instrument both inflates the estimate’s variance (so the confidence interval balloons) and biases it toward the confounded observational estimate. ‘The gene barely moves the exposure but strongly tracks the outcome’ is therefore a red flag for horizontal pleiotropy — the G→Y effect is leaking through a route other than X — not a strong causal signal. Always report the F-statistic to show the instrument is strong.
Exam move
This chapter is interpretation-heavy, so the centrepiece is the DAG and the three IV assumptions: be able to draw G→X→Y from memory with the two forbidden dashed-red arrows (no G–U, no direct G→Y) and recite R-I-E — Relevance (strong G→X, testable), Independence (no G–U), Exclusion (no G→Y except via X). Practise the Wald ratio βG→Y/βG→X and the weak-instrument trap. For the guaranteed appraisal task, work threats assumption by assumption — horizontal pleiotropy (probe with MR-Egger / weighted median), weak instruments (F > 10), stratification (principal components / within-family), canalisation (a lifelong, not short-term, effect) — demand replication and triangulation, and conclude on the continuum: X likely causes Y, never ‘proven.’