University of Melbourne · S1 2026 · FACULTY OF HEALTH & MEDICINE

POPH90111 · Genetic Epidemiology

Q: What is the difference between a Manhattan plot and a QQ plot?

The Manhattan plot shows where the hits are — −log10(p) across the chromosomes, with peaks above ≈7.3 marking associated loci. The QQ plot is a calibration check you read before trusting any peak: observed −log10(p) against the null expectation. A late upward tail only = genuine associations (the good GWAS); a whole-line lift from the origin = genomic inflation (λ > 1), the signature of population stratification or other artefact.

Q: How do you detect and fix population stratification?

Detect it from a whole-line QQ lift and λGC > 1. Fix it by (1) adjusting for ancestry principal components in the logistic model — the standard solution; (2) matching cases and controls on ancestry; (3) genomic control (divide χ² by λ); or (4) family-based designs. A Hardy–Weinberg deviation in controls is also a useful QC flag for genotyping error or structure.

- one subject, every graph, every model, every mark

50% final exam · hurdle14 Chapters6-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

Chapter 3 of 7 · POPH90111

Genetic Association Studies

Heritability said genes matter; association asks which ones. At its core a genetic-association study is a case-control study whose exposure is a genetic marker — usually a SNP — comparing how often the variant appears in cases versus controls. Scale that from a handful of candidate genes to millions of SNPs across the whole genome and you have a GWAS. The price of testing millions of hypotheses is a flood of false positives, which forces the famously strict genome-wide threshold p < 5×10⁻⁸ and two diagnostic plots: the Manhattan (where are the hits?) and the QQ (is the whole study inflated?). The chapter is calculation plus interpretation: build the 2×2 and compute the per-allele OR = ad/bc and an allelic χ² (df = 1), read a logistic-regression row (OR = e^β), then read the plots and rule out the key confounder — population stratification, fixed by adjusting for ancestry principal components.

In this chapter

What this chapter covers

013.1 The idea: exposure = a SNP; three reasons for a signal (causal / LD / stratification)
02Candidate-gene vs genome-wide design
033.2 The 2×2: per-allele OR = ad/bc and the allelic χ² (df = 1); genotypic df = 2
043.3 Multiple testing and the 5×10⁻⁸ genome-wide threshold (Bonferroni)
053.4 Reading the Manhattan plot (peaks above ≈7.3 = loci, not causal genes)
063.5 The QQ plot & genomic inflation (λ_GC); tail-lift vs whole-line lift
073.6 Population stratification — the key confounder — and how to fix it
083.7 The GWAS read-out drill: effect → test → multiple testing → calibration → replication

Worked example · free

Worked example: per-allele OR and the allelic χ²

Q [6 marks]. An allelic count table: among cases the risk allele A = 120 and the other allele = 80; among controls A = 90 and the other = 110. (a) Compute the per-allele odds ratio. (b) Test it with an allelic χ². (c) Would it be genome-wide significant in a GWAS of one million SNPs?

+1Label the 2×2. a = 120, b = 80, c = 90, d = 110; N = 400 alleles.
+1(a) Odds ratio. OR = ad/bc = (120 × 110) / (80 × 90) = 13200 / 7200 = 1.83 — each risk allele raises the odds about 83%.
+1(b) Expected counts. Row totals 200/200; column totals A = 210, other = 190. E(case,A) = 200×210/400 = 105, and similarly 95, 105, 95.
+1(b) Chi-square. χ² = Σ(O−E)²/E = (15²/105) + (15²/95) + (15²/105) + (15²/95) ≈ 2.14 + 2.37 + 2.14 + 2.37 = 9.0; df = 1, and 9.0 > 3.84, so p < 0.05.
+1(c) Genome-wide? The Bonferroni threshold is 0.05 / 1,000,000 = 5×10⁻⁸. Here p ≈ 0.003, which is far larger, so it is nominally significant but not genome-wide significant — in a GWAS this is noise.
+1Appraise. A genome-wide hit must also replicate independently and survive a clean QQ plot; the top SNP marks a locus in LD with the truth, not ‘the disease gene’.

OR = 1.83 per risk allele; χ² ≈ 9.0 on 1 df (p ≈ 0.003) is significant at the nominal level but nowhere near the 5×10⁻⁸ genome-wide threshold, so it would be treated as noise in a GWAS — and even a true hit must replicate and clear a clean QQ plot before it counts.

Sia tip — Draw an ‘×’ across the 2×2: top-left × bottom-right over top-right × bottom-left gives OR = ad/bc — and remember the GWAS bar (5×10⁻⁸, ≈7.3 on the plot) is far above the textbook 3.84.

Glossary

Key terms

Allelic χ² test: A test of genetic association comparing the observed allele counts in cases versus controls against those expected under no association, χ² = Σ(O−E)²/E with E = row total × column total / N. The allelic test has df = 1 (significant if χ² > 3.84 at α = 0.05); the genotypic test (three genotypes) has df = 2.
Per-allele (additive) odds ratio: The odds ratio per extra copy of the risk allele, OR = ad/bc from the allelic 2×2, or OR = e^β from a logistic regression that codes genotype 0/1/2 by the number of risk alleles. It is the standard GWAS effect measure, reported with a 95% CI and adjusted for ancestry principal components, age and sex.
Genome-wide significance (5×10⁻⁸): The strict significance threshold for a GWAS, the Bonferroni correction for roughly one million effectively-independent common-variant tests (0.05 / 10⁶). On a Manhattan plot it sits at −log₁₀(5×10⁻⁸) ≈ 7.3. A signal must clear this line and then replicate in an independent sample.
Manhattan plot: The GWAS summary plot: each dot is one SNP, with genomic position (chromosome 1→22) on the x-axis and −log₁₀(p) on the y-axis. Genuine associations rise as ‘skyscrapers’ above the genome-wide line at ≈7.3. One peak is one locus — a cluster of correlated SNPs in LD — and the tallest SNP is the best tag, not necessarily the causal variant.
Population stratification: Confounding by ancestry: if cases and controls differ in ancestry and both allele frequencies and disease rates vary by ancestry, then any ancestry-marking SNP looks associated — a spurious hit driven by structure, not biology. It is the usual cause of a whole-line QQ lift (genomic inflation λ_GC > 1) and is fixed by adjusting for ancestry principal components, matching, or genomic control.

FAQ

Genetic Association Studies FAQ

Why does a GWAS need a threshold as strict as 5×10⁻⁸?

Because of multiple testing. Test one SNP at α = 0.05 and you accept a 1-in-20 false-positive risk; test a million SNPs and you expect about 50,000 ‘significant’ results purely by chance. The Bonferroni correction divides 0.05 by the number of effectively-independent tests — about a million common-variant tests across the genome — giving 5×10⁻⁸. Memorise both the p-value and its −log₁₀ ≈ 7.3.

What is the difference between a Manhattan plot and a QQ plot?

The Manhattan plot shows where the hits are — −log₁₀(p) across the chromosomes, with peaks above ≈7.3 marking associated loci. The QQ plot is a calibration check you read before trusting any peak: observed −log₁₀(p) against the null expectation. A late upward tail only = genuine associations (the good GWAS); a whole-line lift from the origin = genomic inflation (λ > 1), the signature of population stratification or other artefact.

Why can’t I call the top Manhattan SNP ‘the disease gene’?

Because it is almost always just the best tag in LD with the true causal variant, not the cause itself, and a single peak can span several genes. The honest statement is ‘a locus at chromosome X is associated; fine-mapping is needed to identify the causal variant’. Reporting the top SNP as the disease gene is a classic association-chapter error.

How do you detect and fix population stratification?

Detect it from a whole-line QQ lift and λ_GC > 1. Fix it by (1) adjusting for ancestry principal components in the logistic model — the standard solution; (2) matching cases and controls on ancestry; (3) genomic control (divide χ² by λ); or (4) family-based designs. A Hardy–Weinberg deviation in controls is also a useful QC flag for genotyping error or structure.

Study strategy

Exam move

Anchor the chapter on the one calculation that recurs: 2×2 → OR = ad/bc and the allelic χ² with E = row×col/N (df = 1, critical value 3.84; genotypic df = 2). Then memorise the GWAS gates in order — effect size (OR away from 1, CI excludes 1), test (small p), multiple testing (p < 5×10⁻⁸), calibration (QQ diagonal except a tail; λ ≈ 1), confounding (adjust for principal components), confirmation (replication), localisation (fine-mapping). Practise reading both plots: Manhattan peaks above ≈7.3 are loci not causal genes; a QQ tail-lift is real signal but a whole-line lift is inflation → population stratification. The appraisal line to write: a genome-wide hit marks a locus in LD with the truth and must replicate.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 8 of your University of Melbourne subjects - and 1,000+ Bibles across every Australian university.

Sia - your POPH90111 tutor, unlimited, worked the way the exam marks it

The full 6-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works