University of Melbourne · S1 2026 · FACULTY OF SCIENCE

MAST90139 · Statistical Modelling For Data Science

- one subject, every graph, every model, every mark

50% final exam · hurdle14 Chapters3-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

Chapter 4 of 8 · MAST90139

Binomial Models

When binary outcomes are grouped — r successes out of n trials at each covariate setting — the response is binomial and you fit grouped logistic regression. The big payoff over ungrouped data is that the residual deviance becomes a genuine goodness-of-fit statistic: under a correct model D is approximately χ² on n−q degrees of freedom, so you can test whether the model fits. The chapter covers binomial counts and the deviance / Pearson X² goodness-of-fit test, the dose–response setting (and LD50, the dose giving a 50% response), and the crucial complication of overdispersion — data wobbling more than the binomial allows — which you detect from D ≫ df, fix by estimating a dispersion φ and refitting quasi-binomial, and then test with an F-test rather than χ². A full worked dose–response example ties it together end to end.

In this chapter

What this chapter covers

01Grouped (binomial) logistic regression: r successes out of n
02Residual deviance and Pearson X² as goodness-of-fit statistics
03The χ² goodness-of-fit test on n−q df
04Dose–response models and LD50
05Overdispersion: detecting D ≫ df
06Estimating the dispersion φ and refitting quasi-binomial
07The F-test for quasi-binomial model comparison

Worked example · free

Worked example: goodness-of-fit and overdispersion for a grouped binomial

Q [6 marks]. A grouped logistic model is fitted to 12 dose groups. R reports a residual deviance of 28.0 on 10 degrees of freedom. (a) Test the goodness of fit. (b) Estimate the dispersion φ and state what it suggests. (c) If you refit as quasi-binomial, what changes — the coefficients, the standard errors, or both?

+2(a) GoF test: compare D = 28.0 to χ²_0.95(10) ≈ 18.3. Since 28.0 > 18.3, reject the model — there is significant lack of fit.
+2(b) Estimate φ: φ̂ ≈ D/df = 28.0/10 = 2.8 (Pearson X²/df is the better estimate, but the deviance ratio flags it). φ̂ ≫ 1 suggests overdispersion rather than a structural fault.
+2(c) Quasi-binomial refit: the coefficients are unchanged; only the standard errors inflate by √φ̂ ≈ 1.67, widening CIs and shrinking z-statistics.

D = 28.0 on 10 df exceeds χ²_0.95(10) ≈ 18.3, so the model fits poorly; φ̂ ≈ 2.8 points to overdispersion; refitting quasi-binomial leaves the coefficients identical but multiplies the standard errors by √φ̂ ≈ 1.67, giving honest (wider) uncertainty.

Sia tip — Goodness-of-fit by residual deviance is valid for grouped binomial data (and Poisson counts) but not for ungrouped binary 0/1 data — that is a classic exam trap. Always check D against its df = n−q before trusting the model.

Glossary

Key terms

Grouped logistic regression: Logistic regression where each row is r successes out of n trials at a covariate setting, so the response is binomial(n, π) rather than a single 0/1. Grouping makes the residual deviance a usable goodness-of-fit statistic.
Goodness-of-fit deviance: For grouped data, the residual deviance D measures how far the fitted model is from the saturated model; under a correct model D ~ χ²(n−q). D much larger than its df signals lack of fit or overdispersion. It is not valid for ungrouped binary data.
LD50: The dose at which the modelled probability of response is 0.5 — the lethal/effective dose for half the population. From logit(π) = β₀ + β₁(dose), LD50 = −β₀/β₁, the dose where the linear predictor is zero.
Overdispersion: Binomial (or Poisson) data showing more variability than the model allows, revealed by residual deviance far above its degrees of freedom. It inflates the true standard errors, so ignoring it makes p-values too small. Fixed by estimating a dispersion φ and refitting a quasi-model.
Quasi-binomial model: A fit that keeps the binomial mean structure but estimates a dispersion parameter φ, so Var = φnπ(1−π). The coefficients are unchanged from the ordinary fit; standard errors scale by √φ̂, and model comparison uses an F-test rather than χ².

FAQ

Binomial Models FAQ

What is the difference between binomial and ordinary logistic regression?

They are the same model with the data organised differently. Ungrouped logistic regression has one 0/1 row per individual; binomial (grouped) logistic regression collapses individuals with identical covariates into r successes out of n trials. The coefficients and odds ratios are the same, but grouping makes the residual deviance a valid goodness-of-fit test.

When can I use the residual deviance as a goodness-of-fit test?

For grouped binomial data (and Poisson counts) with reasonable cell counts, where D ~ χ²(n−q) under a correct model. You cannot use it for ungrouped binary 0/1 data — the chi-square approximation fails there, so a small or large residual deviance says nothing about fit. This distinction is a frequent exam point.

How do I spot and handle overdispersion?

Spot it when the residual deviance is much larger than its degrees of freedom (D/df well above 1) with no obvious structural fault like a missing term or wrong link. Handle it by estimating the dispersion φ (Pearson X²/df) and refitting with family = quasibinomial; the coefficients stay the same, the standard errors inflate by √φ̂, and you compare models with an F-test.

What is LD50 and how do I get it from the fit?

LD50 is the dose at which half the subjects respond — the dose where the modelled probability equals 0.5, i.e. where the linear predictor is zero. From logit(π) = β₀ + β₁·dose, set the right side to 0 and solve: LD50 = −β₀/β₁. It is the standard summary of a dose–response curve.

Study strategy

Exam move

Drill the goodness-of-fit reflex: for grouped data, compare residual deviance D to its df = n−q against χ², and know that this test is invalid for ungrouped 0/1 data. Practise the overdispersion workflow end to end — detect D ≫ df, estimate φ̂ from Pearson X²/df, refit quasi-binomial, and remember that coefficients stay put while standard errors scale by √φ̂ and comparisons switch to the F-test. For dose–response, be able to compute LD50 = −β₀/β₁ and read a fitted curve. The signature exam item is a full dose–response read: fit → goodness-of-fit → spot overdispersion → refit → F-test → odds/effect, so rehearse that whole chain.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 72 of your University of Melbourne subjects - and 1,000+ Bibles across every Australian university.

Sia - your MAST90139 tutor, unlimited, worked the way the exam marks it

The full 3-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works