STAT7038 · Regression Modelling
Simple Linear Regression
Simple linear regression models a response y as a straight-line function of one predictor x plus a random, normally distributed error of constant size — in symbols yi = β0 + β1xi + εi. It rests on the four LINE assumptions (Linearity, Independence, Normality, Equal variance), and its coefficients are read aloud as a slope (the expected change in y per one-unit rise in x) and an intercept (the expected y at x = 0, often an extrapolation). The whole chapter is the engine room of the course: you fit the line by least squares — b1 = Sxy/Sxx and b0 = ŷ − b1x̄ — compute fitted values and residuals, estimate the error variance as σ̂² = MSE = SSE/(n−2), and learn why the estimators are unbiased and minimum-variance (Gauss–Markov BLUE). Master it and multiple regression becomes the same results in matrix clothing.
What this chapter covers
- 01The SLR model and the four LINE assumptions
- 02What the slope and intercept mean — read it aloud
- 03Least squares: the normal equations and b₁ = Sxy/Sxx, b₀ = ȳ − b₁x̄
- 04Fitted values, residuals, and the two residual identities
- 05Estimating the error variance: σ̂² = MSE = SSE/(n−2)
- 06Properties of the LS estimators: unbiasedness, variances, Gauss–Markov BLUE
Worked example: fit the line by hand (study hours vs exam score)
- +1(a) Slope. b1 = Sxy/Sxx = 161/42 = 3.833.
- +1(a) Intercept. b0 = ŷ − b1x̄ = 65.5 − 3.833(5.5) = 44.417; fitted line ŷ = 44.417 + 3.833x.
- +1(b) Interpret. Each extra hour of study is associated with about 3.83 more marks, on average; the line passes through the centroid (5.5, 65.5).
- +1(c) Error variance. σ̂² = MSE = SSE/(n−2) = 4.833/6 = 0.806 — divide by n−2, not n, because two df are spent estimating β0 and β1.
- +1(c) Residual standard error. σ̂ = √0.806 = 0.897 marks on 6 degrees of freedom — exactly the line R prints.
Key terms
- LINE assumptions
- The four conditions behind simple linear regression: Linearity of E(y|x) in x, Independence of the errors, Normality of the errors, and Equal variance Var(εi) = σ². Only L+I+E are needed for the estimators to be good; normality is what buys the exact t and F distributions used for inference.
- Least squares
- The criterion that picks the line minimising the total squared vertical distance from the points. Solving the normal equations gives b1 = Sxy/Sxx and b0 = ŷ − b1x̄.
- Residual
- ei = yi − ŷi, the vertical gap between an observation and the fitted line — what the model could not explain. For any least-squares fit Σei = 0 and Σxiei = 0.
- Mean squared error (MSE)
- The estimate of the error variance, σ̂² = SSE/(n−2). Its square root is the residual standard error. Dividing by n−2 (not n) is the single most common variance slip, and it propagates into every standard error downstream.
- Gauss–Markov theorem
- Under L+I+E, the least-squares estimators are the Best Linear Unbiased Estimators (BLUE): among all linear unbiased rules they have minimum variance. It does not require normality — that is the extra ingredient that turns the variances into exact t and F sampling distributions.
Simple Linear Regression FAQ
Why divide SSE by n − 2 instead of n?
Two degrees of freedom are spent estimating the two parameters β0 and β1 before the residuals are formed, so only n−2 independent pieces of information about the error remain. Dividing by n−2 makes MSE an unbiased estimator of σ²; dividing by n (or n−1) underestimates it and biases every standard error, t-statistic and interval that uses MSE.
What does the intercept actually mean — and should I interpret it?
β0 is the expected value of y when x = 0. Often that is an extrapolation with no real-world meaning (an exam score at zero study hours, say), so you report it but do not over-interpret it. The slope β1 — the expected change in y per one-unit increase in x — is the number the whole course is about.
Does a good fit prove that x causes y?
No. A strong fit only says x and y move together linearly in this sample; a lurking variable or an outlier can manufacture a slope. And r = 0 means no linear relationship, not no relationship at all. Regression measures association, not causation, unless the design (e.g. a randomised experiment) justifies the causal reading.
How does spreading out the x-values change precision?
Var(b1) = σ²/Sxx shrinks as Sxx grows, so x-values bunched together give an imprecise slope while a wide x-spread gives a tight one. Spreading the x out is the cheapest way to a more precise estimate — an exam favourite phrased as 'how would precision change if…'.
Exam move
Put one chain on your A4 sheet and drill it until it is automatic: from the supplied Σx, Σy, Σx², Σxy (or the S-quantities) go straight to Sxy/Sxx → b1 → b0 → the fitted line. Burn in dfE = n − 2 so you never divide SSE by the wrong number. Be ready to interpret the slope in context ('per one-unit increase in x, holding nothing else — this is simple regression'), and remember the three boxed variance formulas plus the one-liner se(b1) = √(MSE/Sxx), because every t-test and CI in the next chapter starts from exactly those. The calculus derivation of the normal equations is not examinable; their use is.