University of Sydney · S1 2026 · FACULTY OF BUSINESS & ECONOMICS

BUSS6002 · Data Science In Business

- one subject, every graph, every model, every mark

50% final exam · hurdle14 Chapters9-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

Chapter 9 of 11 · BUSS6002

Model Evaluation & Selection

Week 9 of BUSS6002 is the quantitative heart of the unit's second half and a near-guaranteed block on the cumulative final exam. It asks a single disciplined question: given several candidate models of different complexity, which one will predict best on data it has never seen? The honest yard-stick is the Expected Prediction Error (EPE), which decomposes into an irreducible noise floor plus bias-squared and variance.

The chapter builds linear basis function (polynomial and radial) models, shows why training error always favours the most complex model and is therefore useless for selection, and teaches the bias-variance trade-off together with the train / validation / test protocol you use to estimate generalisation. Get the protocol order right and the three-part decomposition memorised and you bank the reliable model-selection marks.

In this chapter

What this chapter covers

011. Linear basis function (LBF) models — y = phi(x)^T beta + eps; flexibility grows with the number of basis functions p
022. Polynomial vs radial basis functions — phi_i(x)=x^i versus a Gaussian bump exp(-(x-c_i)^2 / 2s^2)
033. Design matrix and least squares — fit beta_hat = (Phi^T Phi)^-1 Phi^T y, the same normal equations as linear regression
044. Model selection — estimate each candidate's performance on unseen data and pick the best complexity
055. Expected Prediction Error — EPE = E[(y - yhat)^2] under squared loss, a measure of generalisation
066. Bias-variance decomposition — EPE = irreducible sigma^2 + Bias^2 + Variance, only the last two reducible
077. Bias-variance trade-off — flexible models lower bias but raise variance; EPE is U-shaped, so pick the middle
088. Validation-set approach — train / validation / test split; select on validation MSE, report test MSE, re-fit on train+val

Worked example · free

Compute a validation MSE

Q [2 marks]. On a validation set the actual responses are y = (8, 12, 20, 15) and a candidate model predicts yhat = (10, 11, 17, 14). Compute the validation MSE that you would use to compare this model against its rivals.

+1Form the residuals y - yhat = (8-10, 12-11, 20-17, 15-14) = (-2, 1, 3, 1), then square them: (4, 1, 9, 1). Their sum of squares is 15.
+1MSE is the mean of the squared residuals, so divide by n = 4: MSE = 15 / 4 = 3.75. (Dividing by the sum instead of the mean is the common slip.)

Validation MSE = 15 / 4 = 3.75. You would compute the same quantity for every candidate model and select the one with the lowest validation MSE, since the validation MSE is the course's stand-in for the expected prediction error E[(y - yhat)^2].

Sia tip — MSE is a mean, not a total — always divide the squared-residual sum by n. The validation MSE is what you minimise when choosing a model; the test MSE (computed once, at the end) is what you report. Never select a model on the test set: doing so leaks information and inflates the reported performance.

Glossary

Key terms

Linear basis function (LBF) model: A regression y = beta_0 + beta_1 phi_1(x) + ... + beta_p phi_p(x) + eps that is linear in the coefficients beta even though the basis functions phi can be highly non-linear in x. The number of basis functions p sets the model's complexity.
Polynomial vs radial basis function: Two basis families the unit names. Polynomial uses phi_i(x) = x^i (degree-p regression); a radial basis function uses a Gaussian bump phi_i(x) = exp(-(x-c_i)^2 / 2s^2) with centres c_i and width s fixed in advance, which keeps the model linear in beta.
Design matrix (Phi): The matrix whose rows are observations and whose columns are the basis-function values. The least-squares coefficients solve the normal equations beta_hat = (Phi^T Phi)^-1 Phi^T y — the same formula as ordinary linear regression with Phi in place of the raw inputs.
Expected Prediction Error (EPE): Under squared loss, EPE(f_hat) = E[(y - yhat)^2], the average squared gap between a future response and the model's prediction. It measures how well a model generalises to unseen data and is the quantity model selection tries to minimise.
Bias-variance decomposition: EPE = sigma^2 + Bias^2 + Variance. The irreducible sigma^2 = var(eps) is a floor no model can beat; Bias^2 = (E[f_hat] - f)^2 measures how far the average fit is from the truth; Variance = E[(f_hat - E[f_hat])^2] measures how much the fit wobbles across samples. Only bias-squared and variance are reducible.
Bias-variance trade-off: As model flexibility (p) rises, bias falls but variance rises, so EPE is U-shaped — too simple under-fits (high bias), too flexible over-fits (high variance). The best model sits at the intermediate complexity that minimises EPE, not the most flexible one.
Training / validation / test split: Splitting the data (training commonly 50-80%) so that models are fit on training data, compared by their validation MSE, and finally evaluated once on the test set. The validation MSE estimates EPE for selection; the test MSE gives an unbiased estimate of generalisation.
Over-fitting vs under-fitting: An over-fit model (too high p) has low bias but high variance — it fits the training data well but generalises poorly. An under-fit model (too low p) has high bias and misses real structure, doing poorly on both training and test data.

FAQ

Model Evaluation & Selection FAQ

Is Model Evaluation & Selection on the final exam?

Yes. It is Week 9 content, so it is examined in the cumulative 45% final (which covers all weeks); it is not on the 25% mid-semester, which stops at Week 6. The released sample paper shows a model-selection true/false MCQ and a bias-variance short-answer as recurring items, and a Python item can ask you to compute a validation MSE from arrays.

Why can't I just pick the model with the lowest training error?

Because training error falls monotonically as you add complexity — you can drive it to nearly zero by over-fitting — so it always favours the most complex model and tells you nothing about generalisation. Select instead on an estimate of expected prediction error, namely the validation MSE, and report the test MSE.

What is the bias-variance trade-off in one sentence?

As a model becomes more flexible its bias falls but its variance rises, so the expected prediction error is U-shaped and is minimised at an intermediate complexity — neither the simplest nor the most flexible model is best.

What is the difference between validation error and test error?

The validation set is used to choose between candidate models (you pick the lowest validation MSE), while the test set is touched only once, at the very end, to report an unbiased estimate of how the chosen model generalises. Using the test set to select a model leaks information and inflates the reported performance.

What is the irreducible error and why does it matter?

The irreducible error sigma^2 = var(eps) comes from the noise in the data-generating process, not the model, so no model — however flexible or well-tuned — can push the expected prediction error below it. In a short answer, name it explicitly: EPE = sigma^2 + Bias^2 + Variance, and only the last two are reducible.

Is this guide official or affiliated with the University of Sydney?

No. This is an independent AskSia study resource for BUSS6002. It is not produced, endorsed by or affiliated with the University of Sydney; always confirm definitions, notation and assessment details against your official Canvas unit outline.

Study strategy

Exam move

Lock down three things and the model-selection marks become reliable. First, memorise the three-part decomposition EPE = sigma^2 + Bias^2 + Variance and that only bias-squared and variance are reducible — leading with the irreducible floor is an easy mark students forget. Second, internalise the trade-off direction: as complexity p rises, bias falls and variance rises, so expected prediction error is U-shaped and the optimal model is the intermediate one (sketch the U-curve, with bias falling and variance rising, on your A4 note sheet). Third, learn the validation protocol in order: fit candidates on the training set, select the lowest validation MSE, re-estimate the chosen model on training plus validation combined, then report the test MSE — and never select on the test set. Practise the two recurring exam moves: the true/false MCQ (the false option is almost always 'the selected model has the lowest training MSE') and the short computation of a validation MSE as the mean (not the sum) of squared residuals.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 203 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.

Sia - your BUSS6002 tutor, unlimited, worked the way the exam marks it

The full 9-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works