University of Melbourne · S1 2026 · FACULTY OF SCIENCE

MAST90139 · Statistical Modelling For Data Science

- one subject, every graph, every model, every mark
50% final exam · hurdle14 Chapters5-page Bible
Our own words - no uploaded lecturer files
Built to mirror S1 2026 · updated this semester
Chapter 2 of 8 · MAST90139

Generalised Linear Models

This is the spine of the whole course. A generalised linear model has three parts: a random component (the response comes from an exponential-family distribution), a systematic component (a linear predictor η = Xβ), and a link function g that joins them, g(μ) = η. Choose a distribution and a link and you have a named model: normal + identity is ordinary regression, binomial + logit is logistic regression, Poisson + log is Poisson regression. The chapter builds the machinery that runs all of them — the exponential family and its mean–variance law Var(Y) = φV(μ), the canonical links, fitting by iteratively re-weighted least squares (IRLS), and the deviance as the GLM's answer to the residual sum of squares. Learn this one engine and every later model is the same template with a new distribution and link.

In this chapter

What this chapter covers

  • 01The three components: random, systematic, link
  • 02The exponential family of distributions
  • 03Canonical vs non-canonical link functions
  • 04The mean–variance relationship Var(Y) = φV(μ)
  • 05Estimation by iteratively re-weighted least squares (IRLS)
  • 06Deviance and the saturated model
  • 07Scaled deviance and the dispersion parameter φ
  • 08Pearson vs deviance residuals
Worked example · free

Worked example: name the GLM and its canonical link

Q [6 marks]. For each response below, name the natural GLM (distribution + canonical link) and write the link equation. (a) Whether a loan defaults (yes/no). (b) The number of insurance claims on a policy in a year. (c) A continuous, roughly normal measurement (blood pressure). State the mean–variance law in each case.
RANDOMY ~ exp. familySYSTEMATICη = XβLINKg(μ) = ηg(μ) = β₀ + β₁x₁ + ... + βᶉxᶉlogit → logistic  |  log → Poisson  |  identity → normal
  • +1(a) Binary default: response is Bernoulli/binomial. Canonical link = logit: log(π/(1−π)) = Xβ. This is logistic regression.
  • +1(a) Mean–variance: Var(Y) = π(1−π) — the variance is fixed by the mean π, with φ = 1.
  • +1(b) Claim count: response is a count → Poisson. Canonical link = log: log(μ) = Xβ. This is Poisson regression.
  • +1(b) Mean–variance: Var(Y) = μ — mean equals variance, φ = 1 (overdispersion if it exceeds μ).
  • +1(c) Continuous normal: response is normal. Canonical link = identity: μ = Xβ — the ordinary linear model.
  • +1(c) Mean–variance: Var(Y) = σ², constant — here the dispersion φ = σ² is a free parameter, not pinned to 1.
(a) binomial + logit (logistic), Var = π(1−π); (b) Poisson + log, Var = μ; (c) normal + identity (ordinary regression), Var = σ². The same three-part template — random component, linear predictor, link — produces all three.
Sia tip — The exam's first reflex is always Name: response type → distribution → canonical link. Get that automatic and the rest of any GLM question (deviance test, coefficient interpretation) follows a fixed script.
Glossary

Key terms

Random component
The first part of a GLM: the assumption that the response Y comes from an exponential-family distribution (normal, binomial, Poisson, gamma...). It determines the mean–variance relationship and so the weights used in fitting.
Link function
The function g that connects the mean to the linear predictor, g(μ) = η = Xβ. The canonical link makes the sufficient statistic linear in β (logit for binomial, log for Poisson, identity for normal); non-canonical links (e.g. probit) are allowed too.
Exponential family
The class of distributions whose density can be written exp{(yθ − b(θ))/a(φ) + c(y, φ)}. Its members share a common mean–variance structure, which is exactly what lets one fitting algorithm (IRLS) handle them all.
Deviance
D = 2(ℓsat − ℓmodel), twice the log-likelihood gap between the saturated model (a perfect fit) and the fitted model. It is the GLM's residual sum of squares: smaller is better, and the difference between nested models' deviances is the likelihood-ratio test.
IRLS
Iteratively re-weighted least squares — the algorithm R uses to maximise a GLM likelihood. It repeatedly solves a weighted least-squares problem with weights that depend on the current fit, converging to the maximum-likelihood β̂. It reduces to ordinary least squares in the normal-identity case.
FAQ

Generalised Linear Models FAQ

What exactly makes something a GLM?

Three ingredients: a response from an exponential-family distribution (the random component), a linear predictor η = Xβ (the systematic component), and a link function g with g(μ) = η. Fix the distribution and the link and you have named a specific model. Almost every model in MAST90139 is one choice of those two ingredients.

What is the canonical link and do I have to use it?

The canonical link is the one that makes the model's natural parameter equal to the linear predictor — logit for the binomial, log for the Poisson, identity for the normal. It has nice mathematical properties and is the default, but you are not forced to use it: the probit and complementary-log-log are valid non-canonical links for binary data, for instance.

What is the deviance and why not just use the residual sum of squares?

The deviance generalises the residual sum of squares to any GLM. Because GLMs are fitted by likelihood, the natural measure of fit is the log-likelihood gap to the saturated (perfect-fit) model, scaled by 2. For the normal model the deviance literally is the residual sum of squares; for other families it is the right likelihood-based analogue, and its differences give chi-square tests.

What does the dispersion parameter φ do?

φ scales the variance: Var(Y) = φV(μ). For the binomial and Poisson it is fixed at 1, so the variance is completely determined by the mean. For the normal and gamma it is a free parameter (σ² for the normal). When binomial or Poisson data show more spread than φ = 1 allows, that is overdispersion, handled by estimating φ in a quasi-likelihood fit.

Study strategy

Exam move

This is the chapter to over-learn — every later model is this template re-run. Memorise the three components (random, systematic, link) and be able to instantly map a response type to its distribution and canonical link: binary → binomial + logit, count → Poisson + log, continuous → normal + identity. Know the mean–variance law Var(Y) = φV(μ) for each family and which families fix φ = 1. Understand the deviance as the likelihood-based residual sum of squares, and that Δdeviance between nested models is a chi-square test. You do not need to derive IRLS, but know that it is how R fits and that it reduces to OLS in the normal case. Get this engine cold and the families chapters become pattern-matching.

A+Everything unlocked
Unlocks this Bible + all 72 of your University of Melbourne subjects - and 1,000+ Bibles across every Australian university.
Sia - your MAST90139 tutor, unlimited, worked the way the exam marks it
The full 5-page Bible + practice bank with worked solutions
Chrome extension - sync your LMS so Sia knows your deadlines
Bilingual EN / Chinese on every Bible and every Sia answer
$25/ month
30-day money-back · cancel in one tap · how it works
Unlock the full MAST90139 Bible + 72 University of Melbourne subjects解锁完整 MAST90139 Bible + University of Melbourne 72 门科目
$25/mo