Australian National University · S1 2026 · FACULTY OF SCIENCE

STAT7038 · Regression Modelling

- one subject, every graph, every model, every mark
50% final exam · hurdle14 Chapters6-page Bible
Our own words - no uploaded lecturer files
Built to mirror S1 2026 · updated this semester
Chapter 5 of 7 · STAT7038

Multiple Regression Inference

In multiple regression the regression sum of squares can be sliced up in more than one way, and knowing which slice answers which question is the most-tested idea in this half of the course. R's anova() slices sequentially (Type I) — one term at a time, in the order you typed them, each conditioning on the terms above it — while the summary() t-tests and drop1() slice partially, each term last, given all the others. The two agree only for the final term. The bridge between a reduced and a full model is the extra-sum-of-squares principle, which powers the partial (nested) F-test of 'is this subset of predictors a worthwhile addition to a model already containing the rest?'. The chapter then turns to qualitative covariates: a categorical predictor enters through indicator (0/1 dummy) variables, a k-level factor using k−1 of them with one level as the reference. The additive (parallel-lines) model shares a slope and shifts the intercept; adding an interaction term lets the groups have different slopes too — two separate lines.

In this chapter

What this chapter covers

  • 01The MLR ANOVA decomposition and the overall F
  • 02Sequential (Type I) sums of squares — order matters
  • 03Partial sums of squares and the extra-sum-of-squares principle
  • 04The partial (nested) F-test; q = 1 collapses to the t-test
  • 05Qualitative covariates: indicator coding and the reference level
  • 06The additive (parallel-lines) model: β₂ as the group shift
  • 07Interaction (separate-lines) models and the marginality/hierarchy rule
Worked example · free

Worked example: a partial (nested) F-test

Q [6 marks]. For a model with three predictors, the reduced fit (H2S + Lactic) gives SSE(R) = 2825.6 and the full fit (Acetic + H2S + Lactic) gives SSE(F) = 2668.41 on 26 df with MSE(F) = 102.63. Test whether Acetic is a significant addition given H2S and Lactic are already in the model. Compare with F1,26(0.95) ≈ 4.23.
  • +1Hypotheses. H0: the Acetic coefficient is 0 (the reduced model suffices) vs Ha: it is non-zero.
  • +1Extra sum of squares. SSR(extra) = SSE(R) − SSE(F) = 2825.6 − 2668.41 = 157.19, on q = 1 df.
  • +1Form F. F = [SSR(extra)/q] / MSE(F) = (157.19/1) / 102.63 = 1.53 on (1, 26) df.
  • +1Critical value & decision. F1,26(0.95) ≈ 4.23; since 1.53 < 4.23 (p ≈ 0.23), do not reject H0.
  • +1Conclude in context. Once H2S and Lactic are in the model, Acetic adds nothing significant — it can be dropped.
  • +1Note the trap. This disagrees with a sequential table that enters Acetic first (where it looks highly significant), because there it is credited with variation H2S and Lactic also explain.
Extra SS = 2825.6 − 2668.41 = 157.19; F = 157.19/102.63 = 1.53 on (1, 26) df, below F1,26(0.95) ≈ 4.23, so do not reject H0: given H2S and Lactic, Acetic adds nothing significant. (A sequential table entering Acetic first would mislead.)
Glossary

Key terms

Sequential (Type I) sums of squares
What R's anova(lm) reports: SSR is peeled off one term at a time, in the order entered, each line conditioning on the terms above it. The slices depend on order, and only the last line matches the summary() t-test of that coefficient.
Partial sum of squares
What a term adds given all the other terms — treated as if it were last. This is what the summary() t-tests and drop1() report. It is order-invariant and answers the 'marginal-given-the-rest' question.
Extra-sum-of-squares principle
SSR(extra) = SSE(Reduced) − SSE(Full) = SSR(Full) − SSR(Reduced), the variation a set of terms explains that the others cannot. It is always ≥ 0 (dropping a predictor never lowers SSE) and is the numerator of the partial F-test.
Indicator (dummy) variable
A 0/1 variable coding a category. A factor with k levels uses k−1 indicators; the omitted level is the reference (baseline), absorbed into the intercept. R does this automatically with factor() using treatment contrasts (reference = first level alphabetically).
Interaction term
A product x·D added to the model so groups may have different slopes as well as different intercepts — two separate lines. Its coefficient is the difference in slopes; a non-significant interaction collapses back to the parallel-lines (additive) model.
FAQ

Multiple Regression Inference FAQ

Why do the same predictors in a different order give different anova() significance?

Because anova() is sequential: each line conditions only on the terms above it, so SSR(X2 | X1) depends on the order. Re-ordering changes the per-line sums of squares and verdicts. The summary() t-tests, by contrast, are partial (each term last, given all the others) and barely change. Two anova() tables that disagree under reordering are themselves a symptom of multicollinearity.

How do I test a non-final variable with anova()?

Re-order so the variable comes last (then its sequential line is the partial test), or use a partial F-test / drop1(). Sequential sums of squares only add up to a clean extra-sum-of-squares test when the terms you want are last and consecutive. The extra-sum-of-squares principle — SSE(Reduced) − SSE(Full) over q df, divided by MSE(Full) — gives the F directly via anova(reduced, full).

What does the coefficient of a dummy variable mean?

For a 0/1 indicator it is the expected difference in y between that group and the reference level, holding the other predictors fixed — the vertical shift in the parallel-lines model. Always write out the fitted equation for each group by substituting D = 0 and D = 1; markers award the interpretation, not the printout. On a log scale, exp of the coefficient is a multiplicative factor on the original scale.

Can I drop a non-significant main effect if it is in an interaction?

No — the marginality / hierarchy rule says keep a lower-order term whenever its higher-order interaction is retained. Dropping a non-significant main effect that is part of a significant interaction forces both group lines through a shared intercept, which is rarely what the data say. The same rule governs keeping x when x² is in a polynomial model.

Study strategy

Exam move

Anchor everything on one slogan: sequential ≠ partial, and order matters. When a question says 'given the others' you want a partial test (the term last, or a nested F); 'on its own' wants the first / marginal slice. Practise the partial F-test as a ritual — state H0, get SSE(R) and SSE(F), form F = [SSR(extra)/q]/MSE(F) on (q, n−pF) df, compare to the table — and remember q = 1 collapses to t². For qualitative covariates, always write out the fitted line for each group (substitute D = 0 and D = 1), read β2 as the intercept shift and β3 as the slope difference, and apply the hierarchy rule to keep main effects under a retained interaction. If two anova() tables disagree, look at the predictor order first — it is the multicollinearity tell.

A+Everything unlocked
Unlocks this Bible + all 8 of your Australian National University subjects - and 1,000+ Bibles across every Australian university.
Sia - your STAT7038 tutor, unlimited, worked the way the exam marks it
The full 6-page Bible + practice bank with worked solutions
Chrome extension - sync your LMS so Sia knows your deadlines
Bilingual EN / Chinese on every Bible and every Sia answer
$25/ month
30-day money-back · cancel in one tap · how it works
Unlock the full STAT7038 Bible + 8 Australian National University subjects解锁完整 STAT7038 Bible + Australian National University 8 门科目
$25/mo