ECON1012 · Data Analytics
Regression II: Inference & Fit
Regression II: Inference & Fit (Module 10, Week 10) finishes what Week 9 started: instead of just writing down the simple linear regression model, you now estimate it, test it and judge it. First comes the estimation recipe — from summary sums (Σx, Σy, Σx², Σxy) to s_x² and s_xy, then β̂₁ = s_xy/s_x² and β̂₀ = ȳ − β̂₁x̄. Next you measure how badly the line misses: SSE, the sum of squares for error, and the standard error of estimate s_ε = √(SSE/(n−2)). Inference arrives with the slope test — H₀: β₁ = 0 against H_A: β₁ ≠ 0, a t statistic on n − 2 degrees of freedom — and fit is graded by the coefficient of determination R², the proportion of the variation in Y explained by X. ECON 1012 uses one X variable only; multiple regression is out of scope.
What this chapter covers
- 01Estimation recipe: summary sums → s_x², s_xy → β̂₁ = s_xy/s_x², β̂₀ = ȳ − β̂₁x̄
- 02SSE = Σ(yᵢ − ŷᵢ)² = (n−1)(s_y² − s_xy²/s_x²) — the unexplained variation
- 03Standard error of estimate s_ε = √(SSE/(n−2)) — the typical residual size
- 04Slope test: H₀: β₁ = 0 (no linear relationship) vs H_A: β₁ ≠ 0, t = β̂₁/s_{β̂₁}, df = n − 2
- 05Standard error of the slope s_{β̂₁} = s_ε/√((n−1)s_x²)
- 06R² = SSR/SST = 1 − SSE/SST = r² — proportion of Y's variation explained by X
- 07Partition SST = SSR + SSE; naming trap: this course's SSE is elsewhere called RSS
- 08One X variable only — multiple regression is out of scope
Full regression from summary statistics: line, fit and slope test
- 2 marks(a) Building blocks first: x̄ = 130/26 = 5 and ȳ = 260/26 = 10; s_x² = [Σx² − (Σx)²/n]/(n − 1) = (1400 − 650)/25 = 30; s_xy = [Σxy − (Σx)(Σy)/n]/(n − 1) = (1450 − 1300)/25 = 6.
- 2 marks(a) β̂₁ = s_xy/s_x² = 6/30 = 0.200 and β̂₀ = ȳ − β̂₁x̄ = 10 − 0.200 × 5 = 9.000, so ŷ = 9.000 + 0.200x. Interpretation: each additional ad aired is, on average, associated with 0.200 × $1000 = $200 more weekly sales.
- 1 mark(b) s_y² = (2725 − 2600)/25 = 5, so SSE = (n − 1)(s_y² − s_xy²/s_x²) = 25 × (5 − 36/30) = 25 × 3.8 = 95.
- 1 mark(b) Standard error of estimate: s_ε = √(SSE/(n − 2)) = √(95/24) = √3.9583 ≈ 1.990 (in $1000s).
- 2 marks(c) H₀: β₁ = 0 (no linear relationship) vs H_A: β₁ ≠ 0, two-tail at α = 0.05. Standard error of the slope: s_{β̂₁} = s_ε/√((n − 1)s_x²) = 1.990/√750 ≈ 1.990/27.386 ≈ 0.0727. Test statistic t = (0.200 − 0)/0.0727 ≈ 2.75 with df = n − 2 = 24.
- 2 marks(c) Critical values ±t₀.₀₂₅,₂₄ = ±2.064. Since 2.75 > 2.064, reject H₀: there is sufficient evidence at the 5% level of significance of a linear relationship between ads aired and weekly sales.
- 2 marks(d) R² = s_xy²/(s_x²·s_y²) = 36/(30 × 5) = 36/150 = 0.24. Cross-check via the partition: SST = (n − 1)s_y² = 125 and 1 − SSE/SST = 1 − 95/125 = 0.24. About 24% of the variation in weekly sales is explained by variation in ads aired; the remaining 76% sits in SSE.
Key terms
- Sum of squares for error (SSE)
- The sum of squared vertical gaps between the observations and the fitted line, SSE = Σ(yᵢ − ŷᵢ)², with shortcut SSE = (n−1)(s_y² − s_xy²/s_x²). It measures the variation in Y the line leaves unexplained — the smaller, the better the fit.
- Standard error of estimate
- s_ε = √(SSE/(n−2)): roughly the typical size of a residual, in the units of Y. It feeds directly into the standard error of the slope and hence the slope test.
- Slope significance test
- The t-test of H₀: β₁ = 0 (no linear relationship) against H_A: β₁ ≠ 0 (a linear relationship exists), using t = (β̂₁ − β₁)/s_{β̂₁} with n − 2 degrees of freedom. A two-tail test is the most typical choice, though one-tail versions (β₁ > 0 or β₁ < 0) exist.
- Standard error of the slope
- s_{β̂₁} = s_ε/√((n−1)s_x²): the estimated sampling variability of the slope estimate β̂₁, and the denominator of the slope t statistic. More spread in X (larger s_x²) makes the slope estimate more precise.
- Coefficient of determination (R²)
- The proportion of the variation in Y explained by the variation in X: R² = SSR/SST = 1 − SSE/SST, which also equals the square of the correlation, r². It lies between 0 (no linear relationship) and 1 (perfect fit) and has no critical value for hypothesis testing.
- SST = SSR + SSE partition
- The total variation of Y around its mean, SST = Σ(yᵢ − ȳ)² = (n−1)s_y², splits into SSR, the variation explained by X, plus SSE, the unexplained remainder. In this course SSR always means the regression (explained) sum of squares.
Regression II: Inference & Fit FAQ
Why does the slope test use n − 2 degrees of freedom instead of n − 1?
Because the fitted line estimates two coefficients — β̂₀ and β̂₁ — from the data before any residual can be computed, two degrees of freedom are used up. Both the standard error of estimate s_ε = √(SSE/(n−2)) and the slope t statistic therefore run on df = n − 2. Reading the t table at n − 1, the habit carried over from one-mean problems, picks the wrong critical value and is one of the easiest marks to lose in Week 10.
What does R² actually measure, and can I hypothesis-test it?
R² measures the proportion of the variation in Y that is explained by the variation in X: R² = SSR/SST = 1 − SSE/SST, and it equals the square of the correlation coefficient, r². It sits between 0 and 1 — R² = 1 means the points lie exactly on the line, R² = 0 means no linear relationship. The course is explicit that R² has no critical value for testing hypotheses: if you need a formal yes/no on whether a linear relationship exists, run the t-test on the slope, not anything on R².
Is SSE in ECON 1012 the same thing as SSR?
No — and the slides flag this trap directly. In ECON 1012, SSE is the sum of squares for ERROR (the residual, unexplained variation) and SSR is the REGRESSION (explained) sum of squares, with SST = SSR + SSE. Some other textbooks and websites use SSR or RSS for the residual quantity this course calls SSE. On the exam, stick to the course convention and, if in doubt, write the defining formula next to the symbol so your meaning is unambiguous.
How does regression show up on the ECON 1012 final exam?
The final exam is 25 MCQs plus 3 case-study questions covering Weeks 1–10 (180 minutes, invigilated, one double-sided A4 note sheet, non-wireless calculators, Z and t tables provided), and Week 10 material is squarely in scope. In the practice materials the regression case study follows a set shape: interpret the fitted coefficients in context, explain and compute R², run a full slope significance test from reported standard errors showing every step, then bracket the p-value from the t table and say whether the conclusion changes at a different α. Confirm current details on myLearning.
Studying with AI? Sia — free AI economics tutor works through ECON 1012 step by step.
Exam move
Drill the chain that workshop questions, the Module 10 quiz and the practice case study all reuse: sums → s_x², s_xy → β̂₁, β̂₀ → SSE → s_ε → s_{β̂₁} → t — and put it on your A4 note sheet in that order. Three habits protect marks: use df = n − 2 (never n − 1) for every t lookup; keep naming straight — here SSE is the unexplained (residual) sum of squares, SSR the explained one, though other books flip them; and R² has no critical value, so significance always comes from the slope t-test. Show every step of that test and close with a plain-language conclusion at the stated α — 'do not reject', never 'accept'. Re-run the Module 10 quiz on myLearning until the chain is automatic.