Australian National University · S1 2026 · FACULTY OF SCIENCE

STAT7038 · Regression Modelling

Q: What does the intercept actually mean — and should I interpret it?

β0 is the expected value of y when x = 0. Often that is an extrapolation with no real-world meaning (an exam score at zero study hours, say), so you report it but do not over-interpret it. The slope β1 — the expected change in y per one-unit increase in x — is the number the whole course is about.

Q: How does spreading out the x-values change precision?

Var(b1) = σ²/Sxx shrinks as Sxx grows, so x-values bunched together give an imprecise slope while a wide x-spread gives a tight one. Spreading the x out is the cheapest way to a more precise estimate — an exam favourite phrased as 'how would precision change if…'.

- one subject, every graph, every model, every mark

50% final exam · hurdle14 Chapters4-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

Chapter 1 of 7 · STAT7038

Simple Linear Regression

Simple linear regression models a response y as a straight-line function of one predictor x plus a random, normally distributed error of constant size — in symbols y_i = β₀ + β₁x_i + ε_i. It rests on the four LINE assumptions (Linearity, Independence, Normality, Equal variance), and its coefficients are read aloud as a slope (the expected change in y per one-unit rise in x) and an intercept (the expected y at x = 0, often an extrapolation). The whole chapter is the engine room of the course: you fit the line by least squares — b₁ = S_xy/S_xx and b₀ = ŷ − b₁x̄ — compute fitted values and residuals, estimate the error variance as σ̂² = MSE = SSE/(n−2), and learn why the estimators are unbiased and minimum-variance (Gauss–Markov BLUE). Master it and multiple regression becomes the same results in matrix clothing.

In this chapter

What this chapter covers

01The SLR model and the four LINE assumptions
02What the slope and intercept mean — read it aloud
03Least squares: the normal equations and b₁ = Sxy/Sxx, b₀ = ȳ − b₁x̄
04Fitted values, residuals, and the two residual identities
05Estimating the error variance: σ̂² = MSE = SSE/(n−2)
06Properties of the LS estimators: unbiasedness, variances, Gauss–Markov BLUE

Worked example · free

Worked example: fit the line by hand (study hours vs exam score)

Q [5 marks]. For n = 8 students with x = study hours and y = exam score, you are given x̄ = 5.5, ŷ = 65.5, S_xx = 42 and S_xy = 161. (a) Find the least-squares slope and intercept and write the fitted line. (b) Interpret the slope. (c) The error sum of squares is SSE = 4.833 — estimate the error variance and the residual standard error.

+1(a) Slope. b₁ = S_xy/S_xx = 161/42 = 3.833.
+1(a) Intercept. b₀ = ŷ − b₁x̄ = 65.5 − 3.833(5.5) = 44.417; fitted line ŷ = 44.417 + 3.833x.
+1(b) Interpret. Each extra hour of study is associated with about 3.83 more marks, on average; the line passes through the centroid (5.5, 65.5).
+1(c) Error variance. σ̂² = MSE = SSE/(n−2) = 4.833/6 = 0.806 — divide by n−2, not n, because two df are spent estimating β₀ and β₁.
+1(c) Residual standard error. σ̂ = √0.806 = 0.897 marks on 6 degrees of freedom — exactly the line R prints.

ŷ = 44.417 + 3.833x; each study hour adds about 3.83 marks; σ̂² = MSE = 4.833/6 = 0.806, so the residual standard error is σ̂ = 0.897 on 6 df.

Glossary

Key terms

LINE assumptions: The four conditions behind simple linear regression: Linearity of E(y|x) in x, Independence of the errors, Normality of the errors, and Equal variance Var(ε_i) = σ². Only L+I+E are needed for the estimators to be good; normality is what buys the exact t and F distributions used for inference.
Least squares: The criterion that picks the line minimising the total squared vertical distance from the points. Solving the normal equations gives b₁ = S_xy/S_xx and b₀ = ŷ − b₁x̄.
Residual: e_i = y_i − ŷ_i, the vertical gap between an observation and the fitted line — what the model could not explain. For any least-squares fit Σe_i = 0 and Σx_ie_i = 0.
Mean squared error (MSE): The estimate of the error variance, σ̂² = SSE/(n−2). Its square root is the residual standard error. Dividing by n−2 (not n) is the single most common variance slip, and it propagates into every standard error downstream.
Gauss–Markov theorem: Under L+I+E, the least-squares estimators are the Best Linear Unbiased Estimators (BLUE): among all linear unbiased rules they have minimum variance. It does not require normality — that is the extra ingredient that turns the variances into exact t and F sampling distributions.

FAQ

Simple Linear Regression FAQ

Why divide SSE by n − 2 instead of n?

Two degrees of freedom are spent estimating the two parameters β₀ and β₁ before the residuals are formed, so only n−2 independent pieces of information about the error remain. Dividing by n−2 makes MSE an unbiased estimator of σ²; dividing by n (or n−1) underestimates it and biases every standard error, t-statistic and interval that uses MSE.

What does the intercept actually mean — and should I interpret it?

β₀ is the expected value of y when x = 0. Often that is an extrapolation with no real-world meaning (an exam score at zero study hours, say), so you report it but do not over-interpret it. The slope β₁ — the expected change in y per one-unit increase in x — is the number the whole course is about.

Does a good fit prove that x causes y?

No. A strong fit only says x and y move together linearly in this sample; a lurking variable or an outlier can manufacture a slope. And r = 0 means no linear relationship, not no relationship at all. Regression measures association, not causation, unless the design (e.g. a randomised experiment) justifies the causal reading.

How does spreading out the x-values change precision?

Var(b₁) = σ²/S_xx shrinks as S_xx grows, so x-values bunched together give an imprecise slope while a wide x-spread gives a tight one. Spreading the x out is the cheapest way to a more precise estimate — an exam favourite phrased as 'how would precision change if…'.

Study strategy

Exam move

Put one chain on your A4 sheet and drill it until it is automatic: from the supplied Σx, Σy, Σx², Σxy (or the S-quantities) go straight to S_xy/S_xx → b₁ → b₀ → the fitted line. Burn in df_E = n − 2 so you never divide SSE by the wrong number. Be ready to interpret the slope in context ('per one-unit increase in x, holding nothing else — this is simple regression'), and remember the three boxed variance formulas plus the one-liner se(b₁) = √(MSE/S_xx), because every t-test and CI in the next chapter starts from exactly those. The calculus derivation of the normal equations is not examinable; their use is.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 8 of your Australian National University subjects - and 1,000+ Bibles across every Australian university.

Sia - your STAT7038 tutor, unlimited, worked the way the exam marks it

The full 4-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works