MAST90139 · Statistical Modelling For Data Science
Model Checking
The final chapter answers two questions about any fitted GLM: did it fit? and is it the right model? Fit is judged by the deviance and the likelihood-ratio test: for nested models the drop in deviance ΔD = D₀ − D₁ is χ² on the difference in parameters, and for grouped data the residual deviance itself is a goodness-of-fit test. To rank non-nested rivals, where the likelihood-ratio test does not apply, you penalise fit by complexity with AIC and BIC. Then come the GLM diagnostics — deviance and Pearson residuals (what a good fit looks like and the signatures of trouble), and leverage and influence via hat values, half-normal plots and Cook's distance — and an extension to random and mixed effects (GLMMs) for clustered or repeated data. This is where the whole GLM arc becomes exam marks: name the test, do the deviance arithmetic, read the residuals.
What this chapter covers
- 01Deviance and the likelihood-ratio test for nested models
- 02Residual deviance as a goodness-of-fit test (grouped data)
- 03AIC and BIC for ranking non-nested rivals
- 04Deviance vs Pearson residuals — reading a good fit
- 05Leverage and influence: hat values, half-normal plots, Cook's distance
- 06Extension: random and mixed effects (GLMMs)
Worked example: nested test by ΔD, and AIC for non-nested rivals
- +1(a) ΔD: D₀ − D₁ = 60.0 − 48.0 = 12.0, on 5 − 3 = 2 df.
- +2(a) Compare to χ²: χ²0.95(2) = 5.99. Since 12.0 > 5.99, reject H₀ — the two extra terms significantly improve the fit, so keep M₁.
- +2(b) Non-nested: the likelihood-ratio test requires nested models (one a special case of the other). M₁ and M₂ are not nested, so ΔD is not χ² and the test does not apply.
- +1(c) Use AIC/BIC: rank non-nested rivals by an information criterion — AIC = −2ℓ + 2p (or BIC = −2ℓ + p·log n) — and prefer the model with the lower value.
Key terms
- Likelihood-ratio test
- A test comparing two nested models by ΔD = Dsmall − Dlarge, which under the smaller model is approximately χ² on the difference in parameters. The standard way to test a term or group of terms in a GLM; only valid when one model is a special case of the other.
- AIC
- Akaike information criterion, AIC = −2ℓ + 2p — goodness of fit penalised by the number of parameters p. Used to rank models (including non-nested ones); the lower AIC is preferred. Undefined for quasi-likelihood families, which have no true likelihood.
- BIC
- Bayesian information criterion, BIC = −2ℓ + p·log(n) — like AIC but with a heavier, sample-size-dependent penalty, so it favours smaller models more strongly. Also compared as 'lower is better'.
- Deviance residual
- The signed square-root contribution of each observation to the deviance. Plotted against fitted values or covariates to check a GLM fit; a good fit shows residuals scattered around zero with no pattern. Large deviance residuals flag poorly-fit points.
- Cook's distance
- A measure of how much the fitted model changes when an observation is deleted — combining leverage and residual size to flag influential points. Large values single out observations that are bending the fit, examined alongside hat values and half-normal plots.
Model Checking FAQ
How do I test whether adding terms improves a GLM?
If the smaller model is nested in the larger, use the likelihood-ratio test: compute ΔD = Dsmall − Dlarge and compare it to χ² on the difference in the number of parameters. A significant drop means the extra terms matter. In R this is anova(m0, m1, test="Chi").
When can't I use the likelihood-ratio test, and what replaces it?
The likelihood-ratio test only works for nested models — where one is a special case of the other. For non-nested rivals (different link functions, different non-overlapping predictor sets) the deviance difference is not χ², so you rank them by an information criterion instead: AIC or BIC, lower is better. AIC is also undefined for quasi-likelihood fits.
What's the difference between deviance and Pearson residuals?
Both measure how far each observation is from its fitted value, scaled for the GLM's variance. Deviance residuals are the signed contributions to the total deviance; Pearson residuals are (observed − fitted)/√variance. They usually tell the same story; deviance residuals are often preferred for diagnostic plots because their distribution is closer to normal under a good fit.
How do I find influential observations in a GLM?
Combine leverage and residuals. Hat values measure leverage (how extreme a point's covariates are); half-normal plots highlight unusually large residuals; and Cook's distance combines the two to measure how much each point moves the fit. Points with large Cook's distance or high leverage are worth investigating — they may be errors, or genuinely informative extremes.
Exam move
Make the nested-vs-non-nested decision your first move in any model-comparison question. Nested → likelihood-ratio test: ΔD = D₀ − D₁ against χ² on the df difference (and, for grouped data, the residual deviance itself against χ²(n−q) as a goodness-of-fit test). Non-nested → AIC or BIC, lower is better — and never run a χ² test on non-nested models. Then practise reading diagnostics: deviance and Pearson residual plots (flat scatter around zero = good fit), and leverage/influence via hat values, half-normal plots and Cook's distance to spot points bending the fit. Know that GLMMs extend the framework to clustered/repeated data. The exam hands you R output and asks for the right test, the deviance arithmetic, and the residual verdict — rehearse all three at speed.