University of Melbourne · S1 2026 · FACULTY OF SCIENCE

MAST20034 · Critical Thinking With Data

Q: What does ‘all models are wrong but some are useful’ mean for the exam?

That you judge a model by usefulness for its purpose, not truth. So a good answer interprets what the model usefully tells us and names where it simplifies or fails — never claims the model is ‘right’.

Q: How do I read a regression coefficient?

On three axes: sign (direction), size (magnitude per unit, stated in real-world context) and significance (is it distinguishable from zero?). Then sanity-check fit (R²) and the two cautions — causation and extrapolation.

Q: When is a regression slope a causal effect?

Only when the design earns it — a randomised experiment, or thorough adjustment for confounders. In ordinary observational regression the slope is an association; causal language (‘changing X by 1 raises Y’) is an over-claim.

- one subject, every graph, every model, every mark

50% final exam · hurdle14 Chapters2-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

Chapter 7 of 10 · MAST20034

Statistical Modelling

Week 8 teaches you to read a model, not fit one — the verb the exam rewards is interpret, because there is no software in the room. The anchor idea is George Box's: all models are wrong, but some are useful. A model is a deliberate simplification that splits the world into signal + noise; the art is judging whether the signal it captures is useful, not whether it is ‘true’. You learn to read a regression output on three axes — the sign (direction of the relationship), the size (the coefficient's practical magnitude in context) and the significance (is it distinguishable from zero?) — and to say what each coefficient means in plain words. The chapter generalises to GLMs (the same linear idea with a link on the front, e.g. logistic regression for a yes/no outcome), then to diagnostics: what a violated assumption looks like (patterned residuals, non-constant spread). It closes on the two great cautions — never extrapolate beyond the data, and a regression coefficient is not a causal effect unless the design earned it. Exam prompts hand you output and ask you to read and critique it.

In this chapter

What this chapter covers

018.1 What a model is: signal + noise, and ‘all models are wrong’
028.2 Reading a regression: sign, size, significance
038.3 GLMs — the same idea with a link function on the front
048.4 Diagnostics: what an assumption violation looks like
058.5 The two great cautions — extrapolation and causation

Worked example · free

Reading a regression coefficient in context, mark by mark

Q [4 marks]. A regression of house price ($000s) on distance from the city (km) gives a slope of −4.2 (p < 0.001), R² = 0.31. A developer concludes that “moving a house 10 km out would cut its price by $42,000.” In short-answer form, read the coefficient and critique the conclusion.

+1Read sign + size: the slope −4.2 means each extra km from the city is associated with a ~$4,200 lower price on average; sign negative, magnitude modest.
+1Read significance + fit: p < 0.001 says the slope is reliably non-zero, but R² = 0.31 means distance explains only ~31% of price variation — much is left to other factors.
+1Catch the causal error: this is observational regression, so the slope is an association; ‘moving a house’ implies a causal intervention the data do not support (location confounds with suburb amenities, schools, size).
+1Catch the extrapolation/wording: the claim also treats a between-house comparison as a within-house change — you cannot literally move a house — and may extrapolate beyond the observed distance range.

The slope −4.2 means each additional km is associated with about $4,200 less, on average; it is significant (p<0.001) but explains only 31% of variation (R²). The developer's error is causal: an observational coefficient is association, not the effect of ‘moving’ a house (confounded by suburb features), and it misreads a cross-house comparison as a within-house change. No computation — the marks are the contextual read plus the two cautions.

Sia tip — Read every coefficient on three axes — sign, size (in real units, in context), significance — then check the two cautions: is this causal language justified by the design, and is the claim inside the data's range?

Glossary

Key terms

All models are wrong: Box's dictum: every model simplifies reality, so none is literally true; the question is whether its captured signal is useful for the purpose. Frees you to judge usefulness, not correctness.
Signal + noise: The decomposition a model assumes: a systematic part (signal, what the model explains) plus random variation (noise, the residuals). Good modelling captures real signal without mistaking noise for it (overfitting).
Regression coefficient (sign / size / significance): The three things to read off a slope: its direction, its magnitude in context (per one-unit change), and whether it is reliably different from zero. Significance without size, or size without context, is an incomplete read.
GLM (generalised linear model): An extension of linear regression that puts a link function on the front so the same linear predictor can model non-normal outcomes — e.g. logistic regression for binary data, Poisson for counts.
Extrapolation: Using a model to predict outside the range of the data it was fitted on, where the fitted relationship may not hold. One of the two great modelling cautions, alongside reading association as causation.

FAQ

Statistical Modelling FAQ

What does ‘all models are wrong but some are useful’ mean for the exam?

That you judge a model by usefulness for its purpose, not truth. So a good answer interprets what the model usefully tells us and names where it simplifies or fails — never claims the model is ‘right’.

How do I read a regression coefficient?

On three axes: sign (direction), size (magnitude per unit, stated in real-world context) and significance (is it distinguishable from zero?). Then sanity-check fit (R²) and the two cautions — causation and extrapolation.

When is a regression slope a causal effect?

Only when the design earns it — a randomised experiment, or thorough adjustment for confounders. In ordinary observational regression the slope is an association; causal language (‘changing X by 1 raises Y’) is an over-claim.

Study strategy

Exam move

Treat model questions as reading exercises: rehearse the three-axis read (sign / size-in-context / significance) plus the fit (R²) until it is a single fluent sentence. Put the two great cautions — no extrapolation, association≠causation — at the top of the modelling section of your notes, because they catch the over-claims examiners plant. Know GLMs conceptually as ‘linear model + a link’, and be able to say in one line what a diagnostic violation (patterned residuals, fanning spread) signals. Always interpret coefficients in real units and real context, never as bare numbers.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 72 of your University of Melbourne subjects - and 1,000+ Bibles across every Australian university.

Sia - your MAST20034 tutor, unlimited, worked the way the exam marks it

The full 2-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works