MAST90139 · Statistical Modelling For Data Science
Generalised Linear Models
This is the spine of the whole course. A generalised linear model has three parts: a random component (the response comes from an exponential-family distribution), a systematic component (a linear predictor η = Xβ), and a link function g that joins them, g(μ) = η. Choose a distribution and a link and you have a named model: normal + identity is ordinary regression, binomial + logit is logistic regression, Poisson + log is Poisson regression. The chapter builds the machinery that runs all of them — the exponential family and its mean–variance law Var(Y) = φV(μ), the canonical links, fitting by iteratively re-weighted least squares (IRLS), and the deviance as the GLM's answer to the residual sum of squares. Learn this one engine and every later model is the same template with a new distribution and link.
What this chapter covers
- 01The three components: random, systematic, link
- 02The exponential family of distributions
- 03Canonical vs non-canonical link functions
- 04The mean–variance relationship Var(Y) = φV(μ)
- 05Estimation by iteratively re-weighted least squares (IRLS)
- 06Deviance and the saturated model
- 07Scaled deviance and the dispersion parameter φ
- 08Pearson vs deviance residuals
Worked example: name the GLM and its canonical link
- +1(a) Binary default: response is Bernoulli/binomial. Canonical link = logit: log(π/(1−π)) = Xβ. This is logistic regression.
- +1(a) Mean–variance: Var(Y) = π(1−π) — the variance is fixed by the mean π, with φ = 1.
- +1(b) Claim count: response is a count → Poisson. Canonical link = log: log(μ) = Xβ. This is Poisson regression.
- +1(b) Mean–variance: Var(Y) = μ — mean equals variance, φ = 1 (overdispersion if it exceeds μ).
- +1(c) Continuous normal: response is normal. Canonical link = identity: μ = Xβ — the ordinary linear model.
- +1(c) Mean–variance: Var(Y) = σ², constant — here the dispersion φ = σ² is a free parameter, not pinned to 1.
Key terms
- Random component
- The first part of a GLM: the assumption that the response Y comes from an exponential-family distribution (normal, binomial, Poisson, gamma...). It determines the mean–variance relationship and so the weights used in fitting.
- Link function
- The function g that connects the mean to the linear predictor, g(μ) = η = Xβ. The canonical link makes the sufficient statistic linear in β (logit for binomial, log for Poisson, identity for normal); non-canonical links (e.g. probit) are allowed too.
- Exponential family
- The class of distributions whose density can be written exp{(yθ − b(θ))/a(φ) + c(y, φ)}. Its members share a common mean–variance structure, which is exactly what lets one fitting algorithm (IRLS) handle them all.
- Deviance
- D = 2(ℓsat − ℓmodel), twice the log-likelihood gap between the saturated model (a perfect fit) and the fitted model. It is the GLM's residual sum of squares: smaller is better, and the difference between nested models' deviances is the likelihood-ratio test.
- IRLS
- Iteratively re-weighted least squares — the algorithm R uses to maximise a GLM likelihood. It repeatedly solves a weighted least-squares problem with weights that depend on the current fit, converging to the maximum-likelihood β̂. It reduces to ordinary least squares in the normal-identity case.
Generalised Linear Models FAQ
What exactly makes something a GLM?
Three ingredients: a response from an exponential-family distribution (the random component), a linear predictor η = Xβ (the systematic component), and a link function g with g(μ) = η. Fix the distribution and the link and you have named a specific model. Almost every model in MAST90139 is one choice of those two ingredients.
What is the canonical link and do I have to use it?
The canonical link is the one that makes the model's natural parameter equal to the linear predictor — logit for the binomial, log for the Poisson, identity for the normal. It has nice mathematical properties and is the default, but you are not forced to use it: the probit and complementary-log-log are valid non-canonical links for binary data, for instance.
What is the deviance and why not just use the residual sum of squares?
The deviance generalises the residual sum of squares to any GLM. Because GLMs are fitted by likelihood, the natural measure of fit is the log-likelihood gap to the saturated (perfect-fit) model, scaled by 2. For the normal model the deviance literally is the residual sum of squares; for other families it is the right likelihood-based analogue, and its differences give chi-square tests.
What does the dispersion parameter φ do?
φ scales the variance: Var(Y) = φV(μ). For the binomial and Poisson it is fixed at 1, so the variance is completely determined by the mean. For the normal and gamma it is a free parameter (σ² for the normal). When binomial or Poisson data show more spread than φ = 1 allows, that is overdispersion, handled by estimating φ in a quasi-likelihood fit.
Exam move
This is the chapter to over-learn — every later model is this template re-run. Memorise the three components (random, systematic, link) and be able to instantly map a response type to its distribution and canonical link: binary → binomial + logit, count → Poisson + log, continuous → normal + identity. Know the mean–variance law Var(Y) = φV(μ) for each family and which families fix φ = 1. Understand the deviance as the likelihood-based residual sum of squares, and that Δdeviance between nested models is a chi-square test. You do not need to derive IRLS, but know that it is how R fits and that it reduces to OLS in the normal case. Get this engine cold and the families chapters become pattern-matching.