MAST90139 · Statistical Modelling For Data Science
Linear Models
The normal linear model is where every generalised linear model starts — and the case they all generalise. You write the response in matrix form y = Xβ + ε, estimate β by ordinary least squares (which here coincides with maximum likelihood), and read the t and F tests straight off the R output. MAST90139 reviews it not for its own sake but as the springboard: it fixes the vocabulary (design matrix, fitted values, residuals, the hat matrix) and the four LINE assumptions — Linearity, Independence, Normality, Equal variance — precisely so you can see where they break and why binary, count and categorical responses force a GLM. Master the matrix algebra and the diagnostics here, and every later chapter is the same machinery with a new distribution and link.
What this chapter covers
- 011.1 The normal linear model in matrix form (y = Xβ + ε)
- 02The design matrix, fitted values and residuals
- 031.2 OLS estimation — and why OLS = MLE under normality
- 04Sampling distribution of β̂ · the t and F tests
- 051.3 The four LINE assumptions
- 06Residual diagnostics — and where LINE breaks
- 07Why binary / count data force a GLM
Worked example: reading a linear-model summary() and testing a coefficient
- +1(a) t-statistic: t = β̂₁ / se(β̂₁) = 2.50 / 0.50 = 5.0 on 38 df.
- +1(a) Conclude: |t| = 5.0 ≫ t38, 0.025 ≈ 2.02, so reject H₀ — x is a significant predictor (p < 0.001).
- +1(b) 95% CI: β̂₁ ± t38,0.025·se = 2.50 ± 2.02×0.50 = 2.50 ± 1.01 = (1.49, 3.51).
- +1(b) Read it: the interval excludes 0, consistent with the significant t-test.
- +1(c) Why ordinary: the response is continuous with constant-variance normal errors and the mean is modelled directly (identity link), so OLS = MLE; a GLM is needed only when the response is binary/count/categorical and the variance depends on the mean.
Key terms
- Design matrix (X)
- The n × p matrix of predictor values (with a leading column of 1s for the intercept) that turns the model into the compact form y = Xβ + ε. Its columns are the covariates; β is the vector of coefficients.
- Ordinary least squares (OLS)
- The estimator β̂ = (XTX)−1XTy that minimises the residual sum of squares. Under normal, constant-variance errors it equals the maximum-likelihood estimator — the property that makes the linear model the easy special case of a GLM.
- Hat matrix
- H = X(XTX)−1XT, the projection that maps y onto the fitted values ŷ = Hy. Its diagonal entries are the leverages, which measure how much each observation can pull its own fitted value.
- LINE assumptions
- The four conditions of the linear model: Linearity of the mean, Independence of errors, Normality of errors, and Equal (constant) variance. Diagnostics check each; their failure for binary or count data is what motivates the GLM.
- Identity link
- The trivial link g(μ) = μ that models the mean directly. The normal linear model is the GLM with a normal random component and the identity link — which is why no transformation of the mean is needed.
Linear Models FAQ
Why does MAST90139 start with the linear model if the course is about GLMs?
Because every GLM is the linear model with two changes: a non-normal random component and a link function on the mean. Reviewing the linear model fixes the matrix vocabulary (design matrix, hat matrix, residuals) and the LINE assumptions, so that when those assumptions fail for binary or count data you can see exactly which piece the GLM replaces and why.
Is OLS really the same as maximum likelihood?
Yes — but only under the linear model's assumptions. When errors are normal with constant variance, minimising the residual sum of squares is identical to maximising the likelihood, so OLS = MLE. This coincidence is special to the normal case; in a GLM there is no closed-form least-squares solution and you fit by iteratively re-weighted least squares (IRLS) instead.
What are the LINE assumptions and which one breaks first?
Linearity, Independence, Normality, Equal variance. For binary and count responses the last two fail immediately: a Bernoulli or Poisson variance depends on the mean (not constant), and the response is not normal. That is precisely the failure a GLM repairs — it lets the variance follow the mean and models a function of the mean instead of the mean itself.
Do I have to invert matrices by hand in the exam?
No. You read coefficients, standard errors and the t / F tests off the R summary(). The matrix form matters conceptually — it is the notation every later chapter reuses — but the arithmetic the exam asks for is interpreting output, not inverting XTX.
Exam move
Treat this chapter as vocabulary and diagnostics, not as new statistics. Learn the matrix form y = Xβ + ε cold, because the design matrix, fitted values, residuals and hat matrix reappear in every GLM. Be able to read a summary(): the t-test on each coefficient, the F-test on the model, and what the residual standard error is. Above all, internalise the four LINE assumptions and the residual plots that check them — the whole rest of the course is the story of what to do when Normality and Equal-variance fail, so knowing exactly where the linear model breaks is what lets you choose the right GLM later.