STAT7038 · Regression Modelling
Regression Modelling
Regression Modelling teaches the classical linear model end to end — from fitting one straight line through a cloud of points to building, testing and choosing a multiple-regression model with several predictors. You will estimate coefficients by least squares, split the variation with ANOVA, test slopes with t and the model with F, separate a confidence interval for the mean from a prediction interval for a new observation, check the assumptions with residual diagnostics, read everything off an R printout, and finish with model selection. The final exam is 70% of your grade; it is open to one A4 double-sided typed sheet and supplies the calculator, the R outputs and the statistical tables — so it tests whether you can execute and interpret the method on fresh numbers, not whether you can recall it.
What STAT7038 covers
Seven topics, from one straight line to many predictors → one exam-ready map. Each links to its free chapter guide.
How STAT7038 is assessed
| Component | Weight | Format |
|---|---|---|
| Final examination | 70% | MCQ + short calculation + short written · covers Weeks 1–12 · open to one A4 double-sided typed/printed sheet; calculator, R outputs & statistical tables supplied |
| Assignment | 15% | R-based, non-redeemable — submitted across the semester |
| In-tutorial quiz | 10% | Redeemable (the exam mark replaces it if higher) — on simple linear regression |
| Online quiz | 5% | Redeemable, on Canvas — confirm the exact split & dates in your subject guide |
The slope t-test & 95% CI — the signature SLR question, mark by mark
- +1Standard error of the slope. se(b1) = √(MSE / Sxx) = √(0.806 / 42) = 0.139.
- +1Hypotheses. H0: β1 = 0 (no linear relationship) vs Ha: β1 ≠ 0.
- +1Test statistic. t = b1 / se(b1) = 3.833 / 0.139 = 27.68, compared to tn−2 = t6.
- +1Critical value & decision. t6(0.975) = 2.447 (supplied table); |27.68| > 2.447 ⇒ reject H0.
- +195% CI for β1. 3.833 ± 2.447 × 0.139 = 3.833 ± 0.339 = (3.49, 4.17).
- +1Conclude in context. The slope is significantly non-zero; we are 95% confident each extra study hour adds between 3.49 and 4.17 marks.
Key terms
- Least-squares estimator
- The coefficients that minimise the total squared vertical distance from the points to the line. In simple regression b1 = Sxy/Sxx and b0 = ŷ − b1x̄; under the LINE assumptions they are unbiased and, by Gauss–Markov, the best linear unbiased estimators (BLUE).
- ANOVA decomposition
- The exact split of total variation in y into the part the line explains and the part it leaves: SST = SSR + SSE, with degrees of freedom (n−1) = (p−1) + (n−p). It drives the F-test and R².
- Confidence interval vs prediction interval
- Both are centred at the fitted value ŷh, but a CI brackets the mean response at xh while a PI brackets one new observation. The PI carries an extra ‘+1’ under the root (the new point’s own error), so it is always wider.
- Leverage (hat value)
- hii measures how far an observation’s x-values sit from the centre of the predictor space; it is the i-th diagonal of the hat matrix H = X(X'X)⁻¹X'. High leverage (hii > 2p/n) is only dangerous when paired with a large residual.
- Multicollinearity
- Near-linear dependence among the predictors, which inflates the variances of the coefficients. Its tell-tale sign is a highly significant overall F with no individual t significant; diagnosed by the variance inflation factor, VIF = 1/(1−R²j), flagged above 5 (serious above 10).
STAT7038 FAQ
Is STAT7038 hard?
It is method-dense rather than memory-heavy. Because the exam supplies the R outputs and the statistical tables, the difficulty is in driving the procedure under time — picking the right interval, the right critical value and the right test, then reading and interpreting the output correctly. Master simple linear regression and the multiple-regression half is largely the same results in matrix clothing.
How is STAT7038 assessed?
The final exam is 70% of your grade; the rest is an R-based assignment (about 15%, non-redeemable), an in-tutorial quiz (about 10%, redeemable) and an online quiz (about 5%, redeemable). Confirm this year’s exact split and dates in your subject guide, as details shift between cohorts.
What is allowed in the STAT7038 exam?
The final is open to one A4 double-sided, typed or printed notes sheet, and the paper itself supplies the calculator, the R outputs and the statistical tables (t, F, normal). So you do not waste sheet space on table values or R syntax — spend it on the boxed formulas, the five-step test ritual, the CI-vs-PI rule and the R-output map. Confirm the permitted-materials rule for your sitting in the subject guide.
Do I need to run R in the STAT7038 exam?
No — you read supplied summary() and anova() printouts rather than running R yourself. The skill being tested is mapping each cell (Estimate, Std. Error, t value, Pr(>|t|), the F-statistic) to a formula and recovering hidden quantities such as MSE and n.
Is using AskSia for STAT7038 cheating?
No. AskSia is a study reference written in our own words — we host none of your lecturer’s files, and Sia teaches you the method to earn the marks; it does not complete or sit your assessments.
How to study for the exam
Build your one A4 sheet around the recurring chains, because every exam item is a procedure on supplied numbers. Drill four of them until they are automatic: Sxy/Sxx → b1, b0; SSE → MSE → se(bj) → t → decision; SST = SSR + SSE → F, R²; and xh → CI (mean) or PI (new obs). Show every line on the written parts — method marks are real. Keep the high-yield distinctions sharp: CI vs PI (the ‘+1’), outlier ≠ leverage ≠ influence, and sequential vs partial sums of squares. Practise reading an R printout cold, and remember the multicollinearity giveaway: a significant overall F with no significant individual t.