Australian National University · S1 2026 · FACULTY OF SCIENCE

STAT7038 · Regression Modelling

Q: Why do the same predictors in a different order give different anova() significance?

Because anova() is sequential: each line conditions only on the terms above it, so SSR(X2 | X1) depends on the order. Re-ordering changes the per-line sums of squares and verdicts. The summary() t-tests, by contrast, are partial (each term last, given all the others) and barely change. Two anova() tables that disagree under reordering are themselves a symptom of multicollinearity.

Q: How do I test a non-final variable with anova()?

Re-order so the variable comes last (then its sequential line is the partial test), or use a partial F-test / drop1(). Sequential sums of squares only add up to a clean extra-sum-of-squares test when the terms you want are last and consecutive. The extra-sum-of-squares principle — SSE(Reduced) − SSE(Full) over q df, divided by MSE(Full) — gives the F directly via anova(reduced, full).

Q: What does the coefficient of a dummy variable mean?

For a 0/1 indicator it is the expected difference in y between that group and the reference level, holding the other predictors fixed — the vertical shift in the parallel-lines model. Always write out the fitted equation for each group by substituting D = 0 and D = 1; markers award the interpretation, not the printout. On a log scale, exp of the coefficient is a multiplicative factor on the original scale.

Q: Can I drop a non-significant main effect if it is in an interaction?

No — the marginality / hierarchy rule says keep a lower-order term whenever its higher-order interaction is retained. Dropping a non-significant main effect that is part of a significant interaction forces both group lines through a shared intercept, which is rarely what the data say. The same rule governs keeping x when x² is in a polynomial model.

- one subject, every graph, every model, every mark

50% final exam · hurdle14 Chapters6-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

Chapter 5 of 7 · STAT7038

Multiple Regression Inference

In multiple regression the regression sum of squares can be sliced up in more than one way, and knowing which slice answers which question is the most-tested idea in this half of the course. R's anova() slices sequentially (Type I) — one term at a time, in the order you typed them, each conditioning on the terms above it — while the summary() t-tests and drop1() slice partially, each term last, given all the others. The two agree only for the final term. The bridge between a reduced and a full model is the extra-sum-of-squares principle, which powers the partial (nested) F-test of 'is this subset of predictors a worthwhile addition to a model already containing the rest?'. The chapter then turns to qualitative covariates: a categorical predictor enters through indicator (0/1 dummy) variables, a k-level factor using k−1 of them with one level as the reference. The additive (parallel-lines) model shares a slope and shifts the intercept; adding an interaction term lets the groups have different slopes too — two separate lines.

In this chapter

What this chapter covers

01The MLR ANOVA decomposition and the overall F
02Sequential (Type I) sums of squares — order matters
03Partial sums of squares and the extra-sum-of-squares principle
04The partial (nested) F-test; q = 1 collapses to the t-test
05Qualitative covariates: indicator coding and the reference level
06The additive (parallel-lines) model: β₂ as the group shift
07Interaction (separate-lines) models and the marginality/hierarchy rule

Worked example · free

Worked example: a partial (nested) F-test

Q [6 marks]. For a model with three predictors, the reduced fit (H2S + Lactic) gives SSE(R) = 2825.6 and the full fit (Acetic + H2S + Lactic) gives SSE(F) = 2668.41 on 26 df with MSE(F) = 102.63. Test whether Acetic is a significant addition given H2S and Lactic are already in the model. Compare with F_1,26(0.95) ≈ 4.23.

+1Hypotheses. H₀: the Acetic coefficient is 0 (the reduced model suffices) vs H_a: it is non-zero.
+1Extra sum of squares. SSR(extra) = SSE(R) − SSE(F) = 2825.6 − 2668.41 = 157.19, on q = 1 df.
+1Form F. F = [SSR(extra)/q] / MSE(F) = (157.19/1) / 102.63 = 1.53 on (1, 26) df.
+1Critical value & decision. F_1,26(0.95) ≈ 4.23; since 1.53 < 4.23 (p ≈ 0.23), do not reject H₀.
+1Conclude in context. Once H2S and Lactic are in the model, Acetic adds nothing significant — it can be dropped.
+1Note the trap. This disagrees with a sequential table that enters Acetic first (where it looks highly significant), because there it is credited with variation H2S and Lactic also explain.

Extra SS = 2825.6 − 2668.41 = 157.19; F = 157.19/102.63 = 1.53 on (1, 26) df, below F_1,26(0.95) ≈ 4.23, so do not reject H₀: given H2S and Lactic, Acetic adds nothing significant. (A sequential table entering Acetic first would mislead.)

Glossary

Key terms

Sequential (Type I) sums of squares: What R's anova(lm) reports: SSR is peeled off one term at a time, in the order entered, each line conditioning on the terms above it. The slices depend on order, and only the last line matches the summary() t-test of that coefficient.
Partial sum of squares: What a term adds given all the other terms — treated as if it were last. This is what the summary() t-tests and drop1() report. It is order-invariant and answers the 'marginal-given-the-rest' question.
Extra-sum-of-squares principle: SSR(extra) = SSE(Reduced) − SSE(Full) = SSR(Full) − SSR(Reduced), the variation a set of terms explains that the others cannot. It is always ≥ 0 (dropping a predictor never lowers SSE) and is the numerator of the partial F-test.
Indicator (dummy) variable: A 0/1 variable coding a category. A factor with k levels uses k−1 indicators; the omitted level is the reference (baseline), absorbed into the intercept. R does this automatically with factor() using treatment contrasts (reference = first level alphabetically).
Interaction term: A product x·D added to the model so groups may have different slopes as well as different intercepts — two separate lines. Its coefficient is the difference in slopes; a non-significant interaction collapses back to the parallel-lines (additive) model.

FAQ

Multiple Regression Inference FAQ

Why do the same predictors in a different order give different anova() significance?

Because anova() is sequential: each line conditions only on the terms above it, so SSR(X₂ | X₁) depends on the order. Re-ordering changes the per-line sums of squares and verdicts. The summary() t-tests, by contrast, are partial (each term last, given all the others) and barely change. Two anova() tables that disagree under reordering are themselves a symptom of multicollinearity.

How do I test a non-final variable with anova()?

Re-order so the variable comes last (then its sequential line is the partial test), or use a partial F-test / drop1(). Sequential sums of squares only add up to a clean extra-sum-of-squares test when the terms you want are last and consecutive. The extra-sum-of-squares principle — SSE(Reduced) − SSE(Full) over q df, divided by MSE(Full) — gives the F directly via anova(reduced, full).

What does the coefficient of a dummy variable mean?

For a 0/1 indicator it is the expected difference in y between that group and the reference level, holding the other predictors fixed — the vertical shift in the parallel-lines model. Always write out the fitted equation for each group by substituting D = 0 and D = 1; markers award the interpretation, not the printout. On a log scale, exp of the coefficient is a multiplicative factor on the original scale.

Can I drop a non-significant main effect if it is in an interaction?

No — the marginality / hierarchy rule says keep a lower-order term whenever its higher-order interaction is retained. Dropping a non-significant main effect that is part of a significant interaction forces both group lines through a shared intercept, which is rarely what the data say. The same rule governs keeping x when x² is in a polynomial model.

Study strategy

Exam move

Anchor everything on one slogan: sequential ≠ partial, and order matters. When a question says 'given the others' you want a partial test (the term last, or a nested F); 'on its own' wants the first / marginal slice. Practise the partial F-test as a ritual — state H₀, get SSE(R) and SSE(F), form F = [SSR(extra)/q]/MSE(F) on (q, n−p_F) df, compare to the table — and remember q = 1 collapses to t². For qualitative covariates, always write out the fitted line for each group (substitute D = 0 and D = 1), read β₂ as the intercept shift and β₃ as the slope difference, and apply the hierarchy rule to keep main effects under a retained interaction. If two anova() tables disagree, look at the predictor order first — it is the multicollinearity tell.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 8 of your Australian National University subjects - and 1,000+ Bibles across every Australian university.

Sia - your STAT7038 tutor, unlimited, worked the way the exam marks it

The full 6-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works