Australian National University · S1 2026 · FACULTY OF SCIENCE

STAT7038 · Regression Modelling

- one subject, every graph, every model, every mark
50% final exam · hurdle14 Chapters4-page Bible
Our own words - no uploaded lecturer files
Built to mirror S1 2026 · updated this semester
Chapter 1 of 7 · STAT7038

Simple Linear Regression

Simple linear regression models a response y as a straight-line function of one predictor x plus a random, normally distributed error of constant size — in symbols yi = β0 + β1xi + εi. It rests on the four LINE assumptions (Linearity, Independence, Normality, Equal variance), and its coefficients are read aloud as a slope (the expected change in y per one-unit rise in x) and an intercept (the expected y at x = 0, often an extrapolation). The whole chapter is the engine room of the course: you fit the line by least squares — b1 = Sxy/Sxx and b0 = ŷ − b1x̄ — compute fitted values and residuals, estimate the error variance as σ̂² = MSE = SSE/(n−2), and learn why the estimators are unbiased and minimum-variance (Gauss–Markov BLUE). Master it and multiple regression becomes the same results in matrix clothing.

In this chapter

What this chapter covers

  • 01The SLR model and the four LINE assumptions
  • 02What the slope and intercept mean — read it aloud
  • 03Least squares: the normal equations and b₁ = Sxy/Sxx, b₀ = ȳ − b₁x̄
  • 04Fitted values, residuals, and the two residual identities
  • 05Estimating the error variance: σ̂² = MSE = SSE/(n−2)
  • 06Properties of the LS estimators: unbiasedness, variances, Gauss–Markov BLUE
Worked example · free

Worked example: fit the line by hand (study hours vs exam score)

Q [5 marks]. For n = 8 students with x = study hours and y = exam score, you are given x̄ = 5.5, ŷ = 65.5, Sxx = 42 and Sxy = 161. (a) Find the least-squares slope and intercept and write the fitted line. (b) Interpret the slope. (c) The error sum of squares is SSE = 4.833 — estimate the error variance and the residual standard error.
  • +1(a) Slope. b1 = Sxy/Sxx = 161/42 = 3.833.
  • +1(a) Intercept. b0 = ŷ − b1x̄ = 65.5 − 3.833(5.5) = 44.417; fitted line ŷ = 44.417 + 3.833x.
  • +1(b) Interpret. Each extra hour of study is associated with about 3.83 more marks, on average; the line passes through the centroid (5.5, 65.5).
  • +1(c) Error variance. σ̂² = MSE = SSE/(n−2) = 4.833/6 = 0.806 — divide by n−2, not n, because two df are spent estimating β0 and β1.
  • +1(c) Residual standard error. σ̂ = √0.806 = 0.897 marks on 6 degrees of freedom — exactly the line R prints.
ŷ = 44.417 + 3.833x; each study hour adds about 3.83 marks; σ̂² = MSE = 4.833/6 = 0.806, so the residual standard error is σ̂ = 0.897 on 6 df.
Glossary

Key terms

LINE assumptions
The four conditions behind simple linear regression: Linearity of E(y|x) in x, Independence of the errors, Normality of the errors, and Equal variance Var(εi) = σ². Only L+I+E are needed for the estimators to be good; normality is what buys the exact t and F distributions used for inference.
Least squares
The criterion that picks the line minimising the total squared vertical distance from the points. Solving the normal equations gives b1 = Sxy/Sxx and b0 = ŷ − b1x̄.
Residual
ei = yi − ŷi, the vertical gap between an observation and the fitted line — what the model could not explain. For any least-squares fit Σei = 0 and Σxiei = 0.
Mean squared error (MSE)
The estimate of the error variance, σ̂² = SSE/(n−2). Its square root is the residual standard error. Dividing by n−2 (not n) is the single most common variance slip, and it propagates into every standard error downstream.
Gauss–Markov theorem
Under L+I+E, the least-squares estimators are the Best Linear Unbiased Estimators (BLUE): among all linear unbiased rules they have minimum variance. It does not require normality — that is the extra ingredient that turns the variances into exact t and F sampling distributions.
FAQ

Simple Linear Regression FAQ

Why divide SSE by n − 2 instead of n?

Two degrees of freedom are spent estimating the two parameters β0 and β1 before the residuals are formed, so only n−2 independent pieces of information about the error remain. Dividing by n−2 makes MSE an unbiased estimator of σ²; dividing by n (or n−1) underestimates it and biases every standard error, t-statistic and interval that uses MSE.

What does the intercept actually mean — and should I interpret it?

β0 is the expected value of y when x = 0. Often that is an extrapolation with no real-world meaning (an exam score at zero study hours, say), so you report it but do not over-interpret it. The slope β1 — the expected change in y per one-unit increase in x — is the number the whole course is about.

Does a good fit prove that x causes y?

No. A strong fit only says x and y move together linearly in this sample; a lurking variable or an outlier can manufacture a slope. And r = 0 means no linear relationship, not no relationship at all. Regression measures association, not causation, unless the design (e.g. a randomised experiment) justifies the causal reading.

How does spreading out the x-values change precision?

Var(b1) = σ²/Sxx shrinks as Sxx grows, so x-values bunched together give an imprecise slope while a wide x-spread gives a tight one. Spreading the x out is the cheapest way to a more precise estimate — an exam favourite phrased as 'how would precision change if…'.

Study strategy

Exam move

Put one chain on your A4 sheet and drill it until it is automatic: from the supplied Σx, Σy, Σx², Σxy (or the S-quantities) go straight to Sxy/Sxx → b1 → b0 → the fitted line. Burn in dfE = n − 2 so you never divide SSE by the wrong number. Be ready to interpret the slope in context ('per one-unit increase in x, holding nothing else — this is simple regression'), and remember the three boxed variance formulas plus the one-liner se(b1) = √(MSE/Sxx), because every t-test and CI in the next chapter starts from exactly those. The calculus derivation of the normal equations is not examinable; their use is.

Keep going — explore the course
A+Everything unlocked
Unlocks this Bible + all 8 of your Australian National University subjects - and 1,000+ Bibles across every Australian university.
Sia - your STAT7038 tutor, unlimited, worked the way the exam marks it
The full 4-page Bible + practice bank with worked solutions
Chrome extension - sync your LMS so Sia knows your deadlines
Bilingual EN / Chinese on every Bible and every Sia answer
$25/ month
30-day money-back · cancel in one tap · how it works
Unlock the full STAT7038 Bible + 8 Australian National University subjects解锁完整 STAT7038 Bible + Australian National University 8 门科目
$25/mo