MAST90139 · Statistical Modelling For Data Science
Logistic Regression
Logistic regression is the flagship GLM of MAST90139 and the single most examined model. When the response is binary — default / no default, disease / healthy, pass / fail — you model a probability π ∈ (0,1), and a plain straight line is wrong: fitted values escape [0,1] and the Bernoulli variance π(1−π) is not constant. The fix is the logit link: model the log-odds as linear, logit(π) = log(π/(1−π)) = Xβ. The skill that pays the rent is turning a coefficient into a sentence: eβ is an odds ratio. The chapter also covers inference (Wald and likelihood-ratio tests, confidence intervals) and turning fitted probabilities into decisions — the threshold, the confusion matrix, sensitivity and specificity, and the ROC curve with its AUC.
What this chapter covers
- 01The binary response and why a line fails
- 02The logit link and the logistic (S-shaped) curve
- 03Coefficients as log-odds; eβ as an odds ratio (★ the #1 skill)
- 04Wald tests and likelihood-ratio tests
- 05Confidence intervals for β and for the odds ratio
- 06Classification: threshold, confusion matrix, sensitivity / specificity
- 07The ROC curve and the area under it (AUC)
Worked example: odds ratio and a fitted probability from a logistic fit
- +2(a) Odds ratio: the coefficient is on the log-odds scale, so e0.50 ≈ 1.65. Each extra point of x multiplies the odds of admission by about 1.65 (a 65% rise in the odds).
- +1(b) Linear predictor at x = 8: η = −3.0 + 0.50×8 = 1.0.
- +2(b) Inverse-logit: π = 1/(1 + e−1.0) = 1/(1 + 0.368) = 0.73.
- +1(c) The classmate is wrong: the 0.50 is constant on the log-odds scale, not the probability scale. The effect on probability is largest near π = 0.5 and shrinks toward 0 and 1 — the S-curve flattens at the ends.
Key terms
- Logit link
- g(π) = log(π/(1−π)), the log-odds. Modelling the logit as linear in X keeps fitted probabilities inside (0,1) and makes coefficients interpretable as log-odds — the defining choice of logistic regression.
- Odds ratio
- e raised to a logistic coefficient, eβ. It is the factor by which the odds of the event multiply for a one-unit increase in the predictor. Above 1 the predictor raises the odds; below 1 it lowers them. It is the headline number of any logistic fit.
- Wald test
- A test of H₀: β = 0 using z = β̂/se(β̂), compared to the standard normal. Quick and printed by R, but less reliable than the likelihood-ratio test for small samples or large effects (the Hauck–Donner effect).
- Likelihood-ratio test
- A test comparing nested models by the drop in deviance, ΔD ~ χ² on the difference in parameters. More reliable than the Wald test and the preferred way to test a term or a group of terms in a logistic model.
- ROC curve / AUC
- The ROC curve plots sensitivity against 1−specificity as the classification threshold varies; the area under it (AUC) summarises how well the fitted probabilities separate the two classes. AUC = 0.5 is chance, 1.0 is perfect.
Logistic Regression FAQ
Why can't I just fit a straight line to a 0/1 response?
Two reasons. Fitted values from a line run below 0 and above 1, which are impossible probabilities; and the variance of a 0/1 (Bernoulli) response is π(1−π), which changes with the mean rather than staying constant. Logistic regression fixes both by modelling the log-odds linearly, so fitted probabilities stay in (0,1) and the right variance is built in.
How do I interpret a logistic coefficient?
Exponentiate it: eβ is an odds ratio — the multiplicative change in the odds of the event per one-unit rise in the predictor. Say it as a sentence: 'the odds of [event] multiply by eβ.' Do not call it a change in probability; the probability effect is not constant because the logistic curve is S-shaped.
Wald or likelihood-ratio — which test should I trust?
Prefer the likelihood-ratio (deviance) test. The Wald test (z = β̂/se) is convenient and printed by R, but it can mislead when the effect is large or the sample small. To test several terms at once, only the likelihood-ratio test (comparing nested model deviances against χ²) is appropriate.
What does the ROC curve tell me that the coefficients don't?
Coefficients describe the model's structure; the ROC curve describes its classification performance. By sweeping the threshold you trade sensitivity against specificity, and the area under the curve (AUC) gives a single, threshold-free measure of how well the fitted probabilities separate admitted from not-admitted. A model can have significant coefficients yet a mediocre AUC.
Exam move
Make the odds-ratio sentence automatic — it is the most examined single skill in the subject. For any logistic coefficient: exponentiate, state the multiplier, the direction, and that it acts on the odds. Practise the inverse-logit π = 1/(1+e−η) to get a fitted probability from a linear predictor, and remember the curve crosses 0.5 where η = 0. Know both tests — Wald (z) for a quick single-coefficient check, likelihood-ratio (ΔD vs χ²) for terms and groups — and prefer the latter. Finally, be able to read a confusion matrix into sensitivity and specificity and explain what the ROC curve and AUC add. Avoid the two traps: never read a logistic coefficient as a probability change, and never trust the Wald test blindly for large effects.