MAST90139 · Statistical Modelling For Data Science
Log-Linear Models
Log-linear models analyse a contingency table by treating each cell count as a Poisson response with a log link — so this is Poisson regression where the predictors are the table's classifying factors. The whole subject becomes one question: which factor interactions do you need? Independence of two factors is exactly the absence of their interaction term — that single identity is the chapter's spine. For a 2×2 table the strength of association is the cross-product (odds) ratio, which the interaction coefficient delivers directly as eβ. The chapter covers cells-as-Poisson-counts, independence vs the saturated model, the hierarchy of terms, the cross-product ratio, and the neat equivalence that a log-linear model with a binary response factor is the same as the corresponding logistic regression.
What this chapter covers
- 01Contingency-table cells as Poisson counts
- 02Independence ⇔ no interaction term (the central identity)
- 03The saturated model and the hierarchy of terms
- 04The cross-product (odds) ratio for a 2×2 table (★)
- 05Interaction = association; uniform association
- 06Log-linear ≡ logistic for a two-way table
Worked example: independence and the cross-product ratio in a 2×2 table
- +2(a) Independence term: dropping the interaction term A:B gives the additive model A + B, which is exactly the model of independence of A and B.
- +2(b) Cross-product ratio: the interaction coefficient is the log cross-product ratio, so OR = e0.69 ≈ 2.0 — the odds of B = 1 are about twice as high when A = 1 as when A = 2 (a positive association).
- +2(c) Test independence: the independence model has deviance 9.5 on 1 df; χ²0.95(1) = 3.84. Since 9.5 > 3.84, the independence model fits poorly — A and B are associated, consistent with the non-zero interaction.
Key terms
- Contingency table
- A cross-tabulation of counts classified by two or more factors. In a log-linear model each cell count is treated as a Poisson response, so analysing the table becomes Poisson regression on the classifying factors.
- Independence (log-linear)
- Two factors are independent exactly when their joint probabilities factorise — equivalently, when the log-linear model needs no interaction term between them. The additive model A + B is the model of independence.
- Saturated model
- The log-linear model containing every main effect and every interaction up to the highest order; it fits the observed counts perfectly (zero deviance). Simpler models are tested against it via deviance.
- Cross-product ratio
- For a 2×2 table, (n₁₁n₂₂)/(n₁₂n₂₁) — the odds ratio measuring association. In the log-linear model it equals e raised to the interaction coefficient, so the interaction term directly gives the association.
- Hierarchy principle
- The convention that a log-linear (or any) model containing an interaction also contains all its lower-order terms. It keeps models interpretable and is why software builds models up from main effects to interactions.
Log-Linear Models FAQ
How is a log-linear model just Poisson regression?
Because each cell count in the contingency table is modelled as a Poisson response with a log link, and the predictors are the table's classifying factors and their interactions. Everything from Poisson regression — the log link, the deviance, goodness-of-fit — carries straight over; the only new idea is reading factor interactions as associations.
Why does 'independence' mean 'no interaction'?
Two factors are statistically independent when their joint probability is the product of the marginals. Taking logs turns that product into a sum of main effects with no cross term — which is precisely a log-linear model without the interaction. So testing independence is testing whether the interaction term can be dropped.
How do I get an odds ratio out of a log-linear model?
For a 2×2 table the association is the cross-product ratio (n₁₁n₂₂)/(n₁₂n₂₁), and the log-linear interaction coefficient is its logarithm. So e raised to the interaction coefficient is the odds ratio — the same number you would get from the corresponding logistic regression.
When is a log-linear model the same as a logistic model?
When one factor is a binary response and the others are covariates, the Poisson log-linear model and the binomial logistic model give identical estimates and tests for the effects on that response. They are two views of the same association; you pick whichever framing suits the question (joint table structure vs a single response).
Exam move
Anchor everything on the identity independence ⇔ no interaction term. To test whether two factors are independent, fit the additive (main-effects-only) model and read its residual deviance against χ² on the appropriate df — a large deviance means the interaction is needed, i.e. the factors are associated. To quantify the association in a 2×2 table, compute the cross-product ratio or, equivalently, exponentiate the interaction coefficient. Remember the hierarchy principle (keep lower-order terms) and the equivalence with logistic regression when one factor is a binary response. Because cells are Poisson counts, every Poisson tool — the log link, deviance goodness-of-fit, ΔD tests — applies unchanged, so this chapter is mostly Poisson regression with interactions read as associations.