STAT7038 · Regression Modelling
Multiple Regression Inference
In multiple regression the regression sum of squares can be sliced up in more than one way, and knowing which slice answers which question is the most-tested idea in this half of the course. R's anova() slices sequentially (Type I) — one term at a time, in the order you typed them, each conditioning on the terms above it — while the summary() t-tests and drop1() slice partially, each term last, given all the others. The two agree only for the final term. The bridge between a reduced and a full model is the extra-sum-of-squares principle, which powers the partial (nested) F-test of 'is this subset of predictors a worthwhile addition to a model already containing the rest?'. The chapter then turns to qualitative covariates: a categorical predictor enters through indicator (0/1 dummy) variables, a k-level factor using k−1 of them with one level as the reference. The additive (parallel-lines) model shares a slope and shifts the intercept; adding an interaction term lets the groups have different slopes too — two separate lines.
What this chapter covers
- 01The MLR ANOVA decomposition and the overall F
- 02Sequential (Type I) sums of squares — order matters
- 03Partial sums of squares and the extra-sum-of-squares principle
- 04The partial (nested) F-test; q = 1 collapses to the t-test
- 05Qualitative covariates: indicator coding and the reference level
- 06The additive (parallel-lines) model: β₂ as the group shift
- 07Interaction (separate-lines) models and the marginality/hierarchy rule
Worked example: a partial (nested) F-test
- +1Hypotheses. H0: the Acetic coefficient is 0 (the reduced model suffices) vs Ha: it is non-zero.
- +1Extra sum of squares. SSR(extra) = SSE(R) − SSE(F) = 2825.6 − 2668.41 = 157.19, on q = 1 df.
- +1Form F. F = [SSR(extra)/q] / MSE(F) = (157.19/1) / 102.63 = 1.53 on (1, 26) df.
- +1Critical value & decision. F1,26(0.95) ≈ 4.23; since 1.53 < 4.23 (p ≈ 0.23), do not reject H0.
- +1Conclude in context. Once H2S and Lactic are in the model, Acetic adds nothing significant — it can be dropped.
- +1Note the trap. This disagrees with a sequential table that enters Acetic first (where it looks highly significant), because there it is credited with variation H2S and Lactic also explain.
Key terms
- Sequential (Type I) sums of squares
- What R's anova(lm) reports: SSR is peeled off one term at a time, in the order entered, each line conditioning on the terms above it. The slices depend on order, and only the last line matches the summary() t-test of that coefficient.
- Partial sum of squares
- What a term adds given all the other terms — treated as if it were last. This is what the summary() t-tests and drop1() report. It is order-invariant and answers the 'marginal-given-the-rest' question.
- Extra-sum-of-squares principle
- SSR(extra) = SSE(Reduced) − SSE(Full) = SSR(Full) − SSR(Reduced), the variation a set of terms explains that the others cannot. It is always ≥ 0 (dropping a predictor never lowers SSE) and is the numerator of the partial F-test.
- Indicator (dummy) variable
- A 0/1 variable coding a category. A factor with k levels uses k−1 indicators; the omitted level is the reference (baseline), absorbed into the intercept. R does this automatically with factor() using treatment contrasts (reference = first level alphabetically).
- Interaction term
- A product x·D added to the model so groups may have different slopes as well as different intercepts — two separate lines. Its coefficient is the difference in slopes; a non-significant interaction collapses back to the parallel-lines (additive) model.
Multiple Regression Inference FAQ
Why do the same predictors in a different order give different anova() significance?
Because anova() is sequential: each line conditions only on the terms above it, so SSR(X2 | X1) depends on the order. Re-ordering changes the per-line sums of squares and verdicts. The summary() t-tests, by contrast, are partial (each term last, given all the others) and barely change. Two anova() tables that disagree under reordering are themselves a symptom of multicollinearity.
How do I test a non-final variable with anova()?
Re-order so the variable comes last (then its sequential line is the partial test), or use a partial F-test / drop1(). Sequential sums of squares only add up to a clean extra-sum-of-squares test when the terms you want are last and consecutive. The extra-sum-of-squares principle — SSE(Reduced) − SSE(Full) over q df, divided by MSE(Full) — gives the F directly via anova(reduced, full).
What does the coefficient of a dummy variable mean?
For a 0/1 indicator it is the expected difference in y between that group and the reference level, holding the other predictors fixed — the vertical shift in the parallel-lines model. Always write out the fitted equation for each group by substituting D = 0 and D = 1; markers award the interpretation, not the printout. On a log scale, exp of the coefficient is a multiplicative factor on the original scale.
Can I drop a non-significant main effect if it is in an interaction?
No — the marginality / hierarchy rule says keep a lower-order term whenever its higher-order interaction is retained. Dropping a non-significant main effect that is part of a significant interaction forces both group lines through a shared intercept, which is rarely what the data say. The same rule governs keeping x when x² is in a polynomial model.
Exam move
Anchor everything on one slogan: sequential ≠ partial, and order matters. When a question says 'given the others' you want a partial test (the term last, or a nested F); 'on its own' wants the first / marginal slice. Practise the partial F-test as a ritual — state H0, get SSE(R) and SSE(F), form F = [SSR(extra)/q]/MSE(F) on (q, n−pF) df, compare to the table — and remember q = 1 collapses to t². For qualitative covariates, always write out the fitted line for each group (substitute D = 0 and D = 1), read β2 as the intercept shift and β3 as the slope difference, and apply the hierarchy rule to keep main effects under a retained interaction. If two anova() tables disagree, look at the predictor order first — it is the multicollinearity tell.