University of Melbourne · S1 2026 · FACULTY OF SCIENCE

MAST90139 · Statistical Modelling For Data Science

- one subject, every graph, every model, every mark
50% final exam · hurdle14 Chapters3-page Bible
Our own words - no uploaded lecturer files
Built to mirror S1 2026 · updated this semester
Chapter 1 of 8 · MAST90139

Linear Models

The normal linear model is where every generalised linear model starts — and the case they all generalise. You write the response in matrix form y = Xβ + ε, estimate β by ordinary least squares (which here coincides with maximum likelihood), and read the t and F tests straight off the R output. MAST90139 reviews it not for its own sake but as the springboard: it fixes the vocabulary (design matrix, fitted values, residuals, the hat matrix) and the four LINE assumptions — Linearity, Independence, Normality, Equal variance — precisely so you can see where they break and why binary, count and categorical responses force a GLM. Master the matrix algebra and the diagnostics here, and every later chapter is the same machinery with a new distribution and link.

In this chapter

What this chapter covers

  • 011.1 The normal linear model in matrix form (y = Xβ + ε)
  • 02The design matrix, fitted values and residuals
  • 031.2 OLS estimation — and why OLS = MLE under normality
  • 04Sampling distribution of β̂ · the t and F tests
  • 051.3 The four LINE assumptions
  • 06Residual diagnostics — and where LINE breaks
  • 07Why binary / count data force a GLM
Worked example · free

Worked example: reading a linear-model summary() and testing a coefficient

Q [5 marks]. A linear model y = β₀ + β₁x + ε is fitted to n = 40 observations. R reports β̂₁ = 2.50 with standard error 0.50, residual standard error s on 38 df, and you are told x is the only predictor. (a) Test H₀: β₁ = 0 with a t-test. (b) Give a 95% confidence interval for β₁. (c) Say in one sentence what makes this an ordinary linear model rather than a GLM.
yxŷ = b₀ + b₁xeᵢ
  • +1(a) t-statistic: t = β̂₁ / se(β̂₁) = 2.50 / 0.50 = 5.0 on 38 df.
  • +1(a) Conclude: |t| = 5.0 ≫ t38, 0.025 ≈ 2.02, so reject H₀ — x is a significant predictor (p < 0.001).
  • +1(b) 95% CI: β̂₁ ± t38,0.025·se = 2.50 ± 2.02×0.50 = 2.50 ± 1.01 = (1.49, 3.51).
  • +1(b) Read it: the interval excludes 0, consistent with the significant t-test.
  • +1(c) Why ordinary: the response is continuous with constant-variance normal errors and the mean is modelled directly (identity link), so OLS = MLE; a GLM is needed only when the response is binary/count/categorical and the variance depends on the mean.
t = 5.0 on 38 df rejects H₀ (x is significant); the 95% CI is (1.49, 3.51), which excludes 0; and it is an ordinary linear model because the response is normal with constant variance modelled through the identity link — the one case where OLS and maximum likelihood agree.
Sia tip — In the linear model, the t-test on a coefficient and the F-test on the whole model both come straight off summary(); the same logic reappears in every GLM, but with z / Wald and the deviance replacing t and the residual sum of squares.
Glossary

Key terms

Design matrix (X)
The n × p matrix of predictor values (with a leading column of 1s for the intercept) that turns the model into the compact form y = Xβ + ε. Its columns are the covariates; β is the vector of coefficients.
Ordinary least squares (OLS)
The estimator β̂ = (XTX)−1XTy that minimises the residual sum of squares. Under normal, constant-variance errors it equals the maximum-likelihood estimator — the property that makes the linear model the easy special case of a GLM.
Hat matrix
H = X(XTX)−1XT, the projection that maps y onto the fitted values ŷ = Hy. Its diagonal entries are the leverages, which measure how much each observation can pull its own fitted value.
LINE assumptions
The four conditions of the linear model: Linearity of the mean, Independence of errors, Normality of errors, and Equal (constant) variance. Diagnostics check each; their failure for binary or count data is what motivates the GLM.
Identity link
The trivial link g(μ) = μ that models the mean directly. The normal linear model is the GLM with a normal random component and the identity link — which is why no transformation of the mean is needed.
FAQ

Linear Models FAQ

Why does MAST90139 start with the linear model if the course is about GLMs?

Because every GLM is the linear model with two changes: a non-normal random component and a link function on the mean. Reviewing the linear model fixes the matrix vocabulary (design matrix, hat matrix, residuals) and the LINE assumptions, so that when those assumptions fail for binary or count data you can see exactly which piece the GLM replaces and why.

Is OLS really the same as maximum likelihood?

Yes — but only under the linear model's assumptions. When errors are normal with constant variance, minimising the residual sum of squares is identical to maximising the likelihood, so OLS = MLE. This coincidence is special to the normal case; in a GLM there is no closed-form least-squares solution and you fit by iteratively re-weighted least squares (IRLS) instead.

What are the LINE assumptions and which one breaks first?

Linearity, Independence, Normality, Equal variance. For binary and count responses the last two fail immediately: a Bernoulli or Poisson variance depends on the mean (not constant), and the response is not normal. That is precisely the failure a GLM repairs — it lets the variance follow the mean and models a function of the mean instead of the mean itself.

Do I have to invert matrices by hand in the exam?

No. You read coefficients, standard errors and the t / F tests off the R summary(). The matrix form matters conceptually — it is the notation every later chapter reuses — but the arithmetic the exam asks for is interpreting output, not inverting XTX.

Study strategy

Exam move

Treat this chapter as vocabulary and diagnostics, not as new statistics. Learn the matrix form y = Xβ + ε cold, because the design matrix, fitted values, residuals and hat matrix reappear in every GLM. Be able to read a summary(): the t-test on each coefficient, the F-test on the model, and what the residual standard error is. Above all, internalise the four LINE assumptions and the residual plots that check them — the whole rest of the course is the story of what to do when Normality and Equal-variance fail, so knowing exactly where the linear model breaks is what lets you choose the right GLM later.

A+Everything unlocked
Unlocks this Bible + all 72 of your University of Melbourne subjects - and 1,000+ Bibles across every Australian university.
Sia - your MAST90139 tutor, unlimited, worked the way the exam marks it
The full 3-page Bible + practice bank with worked solutions
Chrome extension - sync your LMS so Sia knows your deadlines
Bilingual EN / Chinese on every Bible and every Sia answer
$25/ month
30-day money-back · cancel in one tap · how it works
Unlock the full MAST90139 Bible + 72 University of Melbourne subjects解锁完整 MAST90139 Bible + University of Melbourne 72 门科目
$25/mo