University of Sydney · S1 2026 · FACULTY OF BUSINESS & ECONOMICS

BUSS6002 · Data Science In Business

- one subject, every graph, every model, every mark

50% final exam · hurdle14 Chapters9-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

Chapter 6 of 11 · BUSS6002

Matrices & Linear Regression

Week 6 turns the unit's linear-algebra notation into its single most-examined model. A matrix stacks the data into rows (observations) and columns (variables), and once the dataset is written that way, fitting a straight line through a cloud of points collapses to one closed-form expression: the OLS estimator β̂ = (XᵀX)⁻¹Xᵀy. This chapter covers the matrix operations you must do by hand (transpose, the row-times-column product and its dimension rule), how to read and interpret regression coefficients, the goodness-of-fit measure R², and the residual-plot diagnostics that decide whether the model is even correctly specified. It is examined across all three question types — MCQ, short-answer derivations, and hand-written Python.

In this chapter

What this chapter covers

011. The data matrix — rows = observations, columns = variables; bold UPPER-case notation
022. Matrix operations — transpose, identity, inverse, determinant (element-wise add/scalar)
033. The matrix product — row·column, the inner-dimension rule, and why AB ≠ BA
044. The regression model — simple y = β₀ + β₁x + ε and matrix form y = Xβ + ε
055. OLS as the closed-form RSS minimiser — β̂ = (XᵀX)⁻¹Xᵀy
066. Coefficient interpretation — an associated average change, never a cause
077. Error assumptions — zero mean / constant variance for estimation; Normality only for inference
088. Goodness of fit and diagnostics — R² = 1 − RSS/TSS, and reading the residual plot

Worked example · free

Fit an OLS line from summary statistics and find R²

Q [6 marks]. A simple linear regression of y on x is summarised by x̄ = 10, ȳ = 50, Sₓₓ = Σ(xᵢ − x̄)² = 200, Sₓᵧ = Σ(xᵢ − x̄)(yᵢ − ȳ) = 900, and total sum of squares TSS = Σ(yᵢ − ȳ)² = 5000. (a) Find the least-squares line. (b) Compute R². (c) Interpret the slope.

+1Slope. β̂₁ = Sₓᵧ / Sₓₓ = 900 / 200 = 4.5.
+1Intercept. β̂₀ = ȳ − β̂₁x̄ = 50 − 4.5×10 = 50 − 45 = 5.
+1Equation. The fitted line is ŷ = 5 + 4.5x.
+1Explained variation. The regression sum of squares is SSR = β̂₁·Sₓᵧ = 4.5×900 = 4050.
+1R². R² = SSR / TSS = 4050 / 5000 = 0.81 (equivalently 1 − RSS/TSS with RSS = 5000 − 4050 = 950).
+1Interpret. A one-unit increase in x is associated with an average increase of 4.5 in y; about 81% of the variation in y is explained by x.

ŷ = 5 + 4.5x; R² = 4050/5000 = 0.81. Each extra unit of x is associated with an average +4.5 in y, and x explains roughly 81% of the variation in y (association, not causation).

Sia tip — When a question hands you the summary sums, use β̂₁ = Sₓᵧ/Sₓₓ then β̂₀ = ȳ − β̂₁x̄ rather than the matrix formula — it is faster and earns the same marks. Get R² from SSR = β̂₁·Sₓᵧ and R² = SSR/TSS, and always state the interpretation as an *associated* change.

Glossary

Key terms

Matrix: A rectangular table of numbers A ∈ ℝᵐˣⁿ (m rows, n columns), written bold UPPER-case. When it holds data the convention is fixed: rows are observations and columns are variables.
Transpose (Aᵀ): The matrix obtained by swapping rows and columns, so (Aᵀ)ᵢⱼ = Aⱼᵢ. It satisfies (Aᵀ)ᵀ = A and (A+B)ᵀ = Aᵀ + Bᵀ, and appears throughout OLS as XᵀX and Xᵀy.
Matrix product (AB): Defined only when the columns of A equal the rows of B; entry cᵢⱼ = Σₖ aᵢₖbₖⱼ is row i of A dotted with column j of B. It is NOT element-wise and NOT commutative — in general AB ≠ BA.
OLS estimator: Ordinary least squares: the coefficient vector that minimises the residual sum of squares ‖y − Xβ‖². For the linear model it has the closed form β̂ = (XᵀX)⁻¹Xᵀy — an exact, one-shot solution requiring no iteration.
Residual: The vertical gap between an observed and fitted value, eᵢ = yᵢ − ŷᵢ. Residuals are the raw material for diagnostics and for the residual sum of squares RSS = Σeᵢ².
R² (coefficient of determination): R² = 1 − RSS/TSS = (TSS − RSS)/TSS, the proportion of variation in y explained by the model, lying in [0, 1]. It can only rise when predictors are added — even useless ones — which is why model selection uses adjusted R².
Standard error of a coefficient: SE(β̂₁) measures the sampling variability of a slope estimate; SE(β̂₁)² = σ²/Σ(xᵢ − x̄)². It drives the t-statistic t = β̂₁/SE(β̂₁) and the approximate 95% confidence interval β̂₁ ± 2·SE.
Heteroscedasticity: Non-constant error variance — var(ε|x) changes with x. It shows up as a funnel / fan-out in the residual-vs-fitted plot and breaks one of the OLS estimation assumptions; a response transform such as log y is the usual remedy.

FAQ

Matrices & Linear Regression FAQ

Is matrix multiplication commutative?

No. In general AB ≠ BA — order matters, and assuming otherwise is a guaranteed wrong MCQ answer. The product is also only defined when the inner dimensions match (columns of A = rows of B); otherwise it is undefined. And it is never element-wise: each entry is a row of A dotted with a column of B.

Does OLS require the errors to be Normally distributed?

Not for estimation. β̂ = (XᵀX)⁻¹Xᵀy is unbiased — E(β̂) = β — under just three assumptions: zero-mean errors E(ε) = 0, constant variance var(ε) = σ², and errors uncorrelated with x. You only add ε ~ N(0, σ²) when you want inference — standard errors, t-tests and confidence intervals. Normality for estimation is a classic exam trap.

What does the slope coefficient actually mean?

β̂₁ is the average change in y associated with a one-unit increase in x, holding other predictors fixed. Phrase it as 'associated with', not 'causes' — regression on observational data shows association, not causation, and the causal wording loses marks.

How do I diagnose a residual plot in the exam?

Plot residuals against the fitted values and look for structure. A flat, formless band of constant spread = a good fit. A curve or U-shape means the zero-conditional-mean assumption E(ε|x) = 0 has failed (a missed nonlinearity — add an x² term). A funnel / fan-out means the constant-variance assumption var(ε|x) = σ² has failed (heteroscedasticity — transform y, e.g. log y). Name the specific assumption, not just 'the model is bad'.

Why isn't a high R² enough to choose a model?

Because R² can only increase (or stay flat) when you add a predictor, even a completely useless one. So a bigger model almost always has a higher R² without being better. That is why the unit introduces adjusted R², which penalises each extra term and can fall — and motivates the bias–variance model-selection ideas in a later chapter.

Study strategy

Exam move

Treat Week 6 as the quantitative core of both exams and drill it for speed by hand. Get matrix multiplication automatic — check inner dimensions, then row-dot-column — because the 1-mark MCQ and every regression quantity (XᵀX, Xᵀy, Xβ̂) depend on it. Memorise the OLS pipeline as a fixed routine: β̂₁ = Sₓᵧ/Sₓₓ, β̂₀ = ȳ − β̂₁x̄, then verify a fitted point, then R² = 1 − RSS/TSS. Practise interpreting a slope as an associated (not causal) average change, and rehearse the residual-plot short-answer until you can instantly map curvature → zero-mean failure and funnel → constant-variance failure with a one-line remedy. Finally, keep the estimation-vs-inference distinction sharp (Normality is only for inference) and be ready to write the equivalent NumPy/statsmodels code from memory.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 203 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.

Sia - your BUSS6002 tutor, unlimited, worked the way the exam marks it

The full 9-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works