STAT7038 · Regression Modelling
Regression Inference
Once the line is fitted, inference cross-examines it. The chapter opens with the ANOVA decomposition: total variation in y splits exactly into the part the line explains and the part it leaves, SST = SSR + SSE, with degrees of freedom (n−1) = 1 + (n−2). From that identity come the two headline summaries — the overall F-test F = MSR/MSE of 'is the line worth anything?' and R² = SSR/SST, the share of variation explained. You then test individual coefficients with a t-test, t = bj/se(bj) on n−2 df, and build confidence intervals the same way. The chapter's signature distinction is the confidence interval for the mean response vs the prediction interval for a new observation: both centre on ŷh, but the PI carries an extra '+1' under the root and is always wider. It closes by teaching you to read the R output — mapping each cell of summary() and anova() to a formula, because the exam hands you the printout rather than R itself.
What this chapter covers
- 01The ANOVA identity SST = SSR + SSE and the df bookkeeping
- 02The overall F-test F = MSR/MSE
- 03R² = SSR/SST and why R² = r²xy in SLR (F = t²)
- 04t-tests for a coefficient and the five-step ritual
- 05Standard errors and confidence intervals for β₀, β₁
- 06CI for the mean response vs PI for a new observation — the '+1'
- 07Reading summary(lm) and anova(lm): recovering MSE and n
Worked example: CI for the mean vs PI for a new observation
- +1Point estimate (both). ŷh = 44.417 + 3.833(7) = 71.25 — the same centre for the CI and the PI.
- +1(a) CI standard error. se = σ̂√(1/n + (xh−x̄)²/Sxx) = 0.897√(1/8 + (1.5)²/42) = 0.379.
- +1(a) CI. 71.25 ± 2.447(0.379) = (70.32, 72.18) — the interval for the mean response.
- +1(b) PI standard error. se = σ̂√(1 + 1/n + (xh−x̄)²/Sxx) = 0.897√(1 + 1/8 + (1.5)²/42) = 0.974.
- +1(b) PI. 71.25 ± 2.447(0.974) = (68.87, 73.63) — for one new observation.
- +1(c) Why wider. The PI adds the new point's own random error εnew (the +1 under the root), so it is wider everywhere; both pinch at xh = x̄ and flare out as you move away.
Key terms
- ANOVA identity
- The exact decomposition SST = SSR + SSE: total variation in y around its mean equals the variation the line explains (SSR) plus the residual variation (SSE). The degrees of freedom add the same way, (n−1) = 1 + (n−2). In R / Kutner, SST is printed as SSTO.
- F-test (overall)
- Tests H0: β1 = 0 with F = MSR/MSE, which follows an F1, n−2 distribution under H0. A large F (MSR far exceeding MSE) is evidence of a real linear relationship. In simple regression F equals the square of the slope's t-statistic.
- Coefficient of determination (R²)
- R² = SSR/SST = 1 − SSE/SST, the share of variation in y the line explains, between 0 and 1. In simple regression it equals the squared correlation r²xy. A high R² is not a goodness-of-fit certificate — pair it with the diagnostic plots.
- Prediction interval
- An interval for one new observation ynew at xh, centred at ŷh but with an extra σ² (the '+1' under the root) for the new point's own random error. It is always wider than the confidence interval for the mean response at the same xh.
- Standard error of a coefficient
- se(b1) = √(MSE/Sxx) and se(b0) = √(MSE(1/n + x̄²/Sxx)). The standardised statistic (bj − βj)/se(bj) follows tn−2, which gives both the t-test and the confidence interval.
Regression Inference FAQ
When do I use a confidence interval and when a prediction interval?
Match the wording. 'The average score for students who study 7 hours' is a CI for the mean. 'Predict the score of one student who studies 7 hours' is a PI. Both are centred at ŷh, but the PI adds the new observation's own error (the +1 under the root) and is always wider. Picking the wrong interval, or dropping the +1, is the single most-marked simple-regression error.
Why does F = t² in simple regression?
The overall F-test of β1 = 0 and the two-sided t-test of β1 = 0 are the same test, so their statistics satisfy F = t² exactly (here t = 27.68 gives t² = 766.1 = F). It is also true that R² equals the squared correlation r²xy. These three equivalences only hold in simple regression with a single slope.
Should I use the t-distribution or the normal for the critical value?
Use tn−2, not the normal. With small n the t has heavier tails and a larger critical value, giving wider, honest intervals; as n grows t approaches the normal. The df is n−2 because two parameters were estimated. In the exam you read the critical value off the supplied t table for the right df.
How do I recover MSE and n from an R printout?
The 'Residual standard error: 0.8975 on 6 degrees of freedom' line gives both: MSE = (residual SE)² = 0.806, and dfE = n−2 = 6 so n = 8. The F line 'on 1 and 6 DF' confirms it. From n you can rebuild any standard error the printout hides, which is exactly the skill the calculation questions test.
Exam move
Run every test as a five-step ritual — hypotheses, statistic, critical value (with df), decision, conclusion in context — because the written parts award method marks for the lines, not just the answer. Keep three chains on your sheet: SST = SSR + SSE → F and R²; se(bj) → t → decision → CI; and xh → CI (mean) or PI (new obs), writing the '+1' that distinguishes them. Then practise reading the R printout cold: map Estimate → bj, Std. Error → se, t value → b/se, Pr(>|t|) → two-sided p (halve for one-sided), and recover MSE = (residual SE)² and n from the degrees of freedom. For one-sided alternatives, use the one-sided critical value and halve R's two-sided p-value.