University of Melbourne · S1 2026 · FACULTY OF HEALTH & MEDICINE

POPH90111 · Genetic Epidemiology

- one subject, every graph, every model, every mark
50% final exam · hurdle14 Chapters4-page Bible
Our own words - no uploaded lecturer files
Built to mirror S1 2026 · updated this semester
Chapter 5 of 7 · POPH90111

Penetrance and Expressivity

Discovering that a variant is associated with disease is only half the story. The clinic question is sharper: if I carry this variant, what is my probability of actually getting the disease, and by what age? That probability is penetrance — Pr(disease | genotype, age), a conditional, age-specific risk that bridges a statistical association to an individual’s absolute risk. Penetrance is usually incomplete (below 1, so carriers are not certain to be affected), it climbs with age as a cumulative-risk curve estimated by survival analysis with censoring (the course’s worked fact: MSH6 ≈ 50% colorectal-cancer risk by age 70 in males), and it is estimated differently by each study design. The guaranteed trap is ascertainment bias: clinic-recruited carrier families are selected on disease and over-estimate penetrance — fixed by the probability-weighted synthetic cohort. The chapter then separates penetrance (whether) from expressivity (how badly), and introduces modifiers — genetic and environmental factors that shift risk within carriers of the same variant.

In this chapter

What this chapter covers

  • 015.1 Complete vs incomplete penetrance; penetrance by genotype-dose (the inheritance signatures)
  • 025.2 Phenocopies: carriers can have risk < 1 and non-carriers risk > 0
  • 035.3 Age-specific (cumulative) penetrance — survival analysis with censoring (MSH6 ~50% by 70)
  • 045.4 Estimating penetrance — the design determines the answer
  • 055.5 The penetrance trap: ascertainment bias & the weighted synthetic cohort
  • 066.1 Expressivity — whether vs how badly
  • 076.2 Modifiers of penetrance (genetic & environmental); modifier (M6) vs G×E (M7)
Worked example · free

Worked example: spotting and correcting ascertainment bias

Q [5 marks]. A paper estimates the penetrance of a cancer gene from 200 carriers recruited at a family-cancer clinic and reports an 80% lifetime risk. A registry-based estimate is about 45%. (a) Name the bias. (b) State its direction. (c) What analysis would correct it, and how should the penetrance be interpreted?
  • +1(a) Name the bias. Clinic carriers were tested because of heavy family history / young-onset / multi-case disease, so the sample is selected on the outcome itself — this is ascertainment bias.
  • +1(b) State the direction. Sampling correlated with disease inflates the affected proportion, so the bias is upward: 80% is an over-estimate and the ~45% registry figure is closer to the population truth.
  • +2(c) Demand the fix. Ask for a probability-weighted analysis — a synthetic cohort that mimics carriers drawn at random: carrier incidence per age = population incidence × carrier RR, with weights so the affected proportion per age-stratum matches population proportions.
  • +1Interpret cautiously. Conclude penetrance is incomplete and likely below the clinic figure; an unweighted clinic penetrance should not be quoted to a population-screening question.
The bias is ascertainment bias (sampling correlated with the outcome), it is upward (80% over-estimates), and it is corrected with a probability-weighted synthetic cohort that reweights strata to population incidence × RR — after which penetrance should be read as incomplete and below the clinic figure.
Sia tip — Keep the pair straight: penetrance = whether a carrier is affected, expressivity = how badly; a variant can be incompletely penetrant and variably expressive at the same time.
Glossary

Key terms

Penetrance
The probability that a person with a given genotype develops the disease by a specified age (possibly conditional on sex), Pr(disease | genotype, age) — a conditional, age-specific risk, not the frequency of the variant nor the proportion of patients who carry it. Usually incomplete (below 1).
Age-specific (cumulative) penetrance
The cumulative risk-versus-age curve: the probability a carrier has been diagnosed by each age. It rises monotonically and is estimated by survival analysis (Kaplan–Meier) with unaffected carriers censored at their current age — a simple affected/all-carriers proportion under-counts because young unaffected carriers have not yet passed through their risk window.
Ascertainment bias
The penetrance trap: carriers found through genetics clinics are tested because their families have strong family history, young onset or multiple cases, so the sample is selected on disease itself. The naïve estimate then over-estimates penetrance (the bias is upward). It is corrected by a probability-weighted ‘synthetic cohort’.
Synthetic cohort (probability weighting)
The fix for ascertainment bias: build a cohort that mimics carriers drawn at random from the population by probability weighting. Set carrier incidence per age = population incidence × carrier relative risk, derive weights so the affected proportion per age-stratum matches population proportions, and analyse the weighted data to get an unbiased, generalisable penetrance.
Expressivity (and modifiers)
Variable expressivity is how badly affected carriers are — among carriers who do develop disease, severity, age of onset and the range of features vary; it is orthogonal to penetrance (whether). A modifier is a genetic or environmental factor that alters disease risk among carriers of the same pathogenic variant, explaining why two carriers of one mutation differ; environmental modifiers are the actionable handles for risk reduction.
FAQ

Penetrance and Expressivity FAQ

Why isn’t a carrier of a pathogenic variant certain to get the disease?

Because penetrance is usually incomplete — below 1 — so some carriers never develop the disease even by old age. Most disease genes behave this way, which is exactly why ‘carrier’ does not mean ‘will be affected’ and why counselling is framed as risk, not certainty. Two boundary facts matter: carriers can have risk below 1 (incomplete penetrance) and non-carriers can have risk above 0 (phenocopies / sporadic cases).

Why estimate age-specific penetrance with survival analysis instead of a simple proportion?

Because a simple ‘affected carriers ÷ all carriers’ under-counts: young, currently-unaffected carriers have not yet lived through their risk window, so counting them as non-events deflates the estimate. Kaplan–Meier censors them at their current age so they contribute the person-time they actually lived, producing the cumulative risk-versus-age curve rather than a misleading snapshot. The slope of that curve tells you when risk concentrates — which sets the age to start screening.

Why do clinic-recruited carrier families over-estimate penetrance?

Because carriers are found through genetics clinics precisely because their families have strong family history, young-onset or multiple cases — the sample is selected on the disease itself, not random with respect to the outcome. So the affected proportion is artificially high and the naïve estimate is biased upward. The fix is a probability-weighted synthetic cohort that reweights so the affected proportion per age-stratum matches the population’s, recovering a penetrance you can quote to a screening question.

What is the difference between penetrance, expressivity and a modifier?

Penetrance is whether a carrier is affected at all (a yes/no or by-age probability). Expressivity is how badly — among affected carriers, the severity, onset and range of features vary. A modifier (Module 6) is a factor that acts within carriers of a known variant to raise or lower their risk, and it is distinct from gene–environment interaction (Module 7), which is tested in the whole population to see whether an exposure’s effect differs across genotypes.

Study strategy

Exam move

Lead with the definition stated precisely — penetrance is Pr(disease | genotype, age), a conditional age-specific risk — then be able to read a cumulative-risk curve and quote a by-age figure (MSH6 ≈50% by 70) and explain why it is estimated by survival analysis with censoring. The guaranteed marks are in the penetrance trap: recognise ascertainment bias in clinic-recruited families, state that it biases penetrance upward, and prescribe the probability-weighted synthetic cohort. Keep the M6 pair crisp — penetrance = whether, expressivity = how badly — and do not conflate a modifier (acts within carriers) with a G×E interaction (tested in the whole population). Distinguish the three estimation designs and their catches.

A+Everything unlocked
Unlocks this Bible + all 8 of your University of Melbourne subjects - and 1,000+ Bibles across every Australian university.
Sia - your POPH90111 tutor, unlimited, worked the way the exam marks it
The full 4-page Bible + practice bank with worked solutions
Chrome extension - sync your LMS so Sia knows your deadlines
Bilingual EN / Chinese on every Bible and every Sia answer
$25/ month
30-day money-back · cancel in one tap · how it works
Unlock the full POPH90111 Bible + 8 University of Melbourne subjects解锁完整 POPH90111 Bible + University of Melbourne 8 门科目
$25/mo