University of Sydney · S1 2026 · FACULTY OF BUSINESS & ECONOMICS

BUSS6002 · Data Science In Business

- one subject, every graph, every model, every mark
50% final exam · hurdle14 Chapters9-page Bible
Our own words - no uploaded lecturer files
Built to mirror S1 2026 · updated this semester
Chapter 6 of 11 · BUSS6002

Matrices & Linear Regression

Week 6 turns the unit's linear-algebra notation into its single most-examined model. A matrix stacks the data into rows (observations) and columns (variables), and once the dataset is written that way, fitting a straight line through a cloud of points collapses to one closed-form expression: the OLS estimator β̂ = (XᵀX)⁻¹Xᵀy. This chapter covers the matrix operations you must do by hand (transpose, the row-times-column product and its dimension rule), how to read and interpret regression coefficients, the goodness-of-fit measure R², and the residual-plot diagnostics that decide whether the model is even correctly specified. It is examined across all three question types — MCQ, short-answer derivations, and hand-written Python.

In this chapter

What this chapter covers

  • 011. The data matrix — rows = observations, columns = variables; bold UPPER-case notation
  • 022. Matrix operations — transpose, identity, inverse, determinant (element-wise add/scalar)
  • 033. The matrix product — row·column, the inner-dimension rule, and why AB ≠ BA
  • 044. The regression model — simple y = β₀ + β₁x + ε and matrix form y = Xβ + ε
  • 055. OLS as the closed-form RSS minimiser — β̂ = (XᵀX)⁻¹Xᵀy
  • 066. Coefficient interpretation — an associated average change, never a cause
  • 077. Error assumptions — zero mean / constant variance for estimation; Normality only for inference
  • 088. Goodness of fit and diagnostics — R² = 1 − RSS/TSS, and reading the residual plot
Worked example · free

Fit an OLS line from summary statistics and find R²

Q [6 marks]. A simple linear regression of y on x is summarised by x̄ = 10, ȳ = 50, Sₓₓ = Σ(xᵢ − x̄)² = 200, Sₓᵧ = Σ(xᵢ − x̄)(yᵢ − ȳ) = 900, and total sum of squares TSS = Σ(yᵢ − ȳ)² = 5000. (a) Find the least-squares line. (b) Compute R². (c) Interpret the slope.
  • +1Slope. β̂₁ = Sₓᵧ / Sₓₓ = 900 / 200 = 4.5.
  • +1Intercept. β̂₀ = ȳ − β̂₁x̄ = 50 − 4.5×10 = 50 − 45 = 5.
  • +1Equation. The fitted line is ŷ = 5 + 4.5x.
  • +1Explained variation. The regression sum of squares is SSR = β̂₁·Sₓᵧ = 4.5×900 = 4050.
  • +1R². R² = SSR / TSS = 4050 / 5000 = 0.81 (equivalently 1 − RSS/TSS with RSS = 5000 − 4050 = 950).
  • +1Interpret. A one-unit increase in x is associated with an average increase of 4.5 in y; about 81% of the variation in y is explained by x.
ŷ = 5 + 4.5x; R² = 4050/5000 = 0.81. Each extra unit of x is associated with an average +4.5 in y, and x explains roughly 81% of the variation in y (association, not causation).
Sia tip — When a question hands you the summary sums, use β̂₁ = Sₓᵧ/Sₓₓ then β̂₀ = ȳ − β̂₁x̄ rather than the matrix formula — it is faster and earns the same marks. Get R² from SSR = β̂₁·Sₓᵧ and R² = SSR/TSS, and always state the interpretation as an *associated* change.
Glossary

Key terms

Matrix
A rectangular table of numbers A ∈ ℝᵐˣⁿ (m rows, n columns), written bold UPPER-case. When it holds data the convention is fixed: rows are observations and columns are variables.
Transpose (Aᵀ)
The matrix obtained by swapping rows and columns, so (Aᵀ)ᵢⱼ = Aⱼᵢ. It satisfies (Aᵀ)ᵀ = A and (A+B)ᵀ = Aᵀ + Bᵀ, and appears throughout OLS as XᵀX and Xᵀy.
Matrix product (AB)
Defined only when the columns of A equal the rows of B; entry cᵢⱼ = Σₖ aᵢₖbₖⱼ is row i of A dotted with column j of B. It is NOT element-wise and NOT commutative — in general AB ≠ BA.
OLS estimator
Ordinary least squares: the coefficient vector that minimises the residual sum of squares ‖y − Xβ‖². For the linear model it has the closed form β̂ = (XᵀX)⁻¹Xᵀy — an exact, one-shot solution requiring no iteration.
Residual
The vertical gap between an observed and fitted value, eᵢ = yᵢ − ŷᵢ. Residuals are the raw material for diagnostics and for the residual sum of squares RSS = Σeᵢ².
R² (coefficient of determination)
R² = 1 − RSS/TSS = (TSS − RSS)/TSS, the proportion of variation in y explained by the model, lying in [0, 1]. It can only rise when predictors are added — even useless ones — which is why model selection uses adjusted R².
Standard error of a coefficient
SE(β̂₁) measures the sampling variability of a slope estimate; SE(β̂₁)² = σ²/Σ(xᵢ − x̄)². It drives the t-statistic t = β̂₁/SE(β̂₁) and the approximate 95% confidence interval β̂₁ ± 2·SE.
Heteroscedasticity
Non-constant error variance — var(ε|x) changes with x. It shows up as a funnel / fan-out in the residual-vs-fitted plot and breaks one of the OLS estimation assumptions; a response transform such as log y is the usual remedy.
FAQ

Matrices & Linear Regression FAQ

Is matrix multiplication commutative?

No. In general AB ≠ BA — order matters, and assuming otherwise is a guaranteed wrong MCQ answer. The product is also only defined when the inner dimensions match (columns of A = rows of B); otherwise it is undefined. And it is never element-wise: each entry is a row of A dotted with a column of B.

Does OLS require the errors to be Normally distributed?

Not for estimation. β̂ = (XᵀX)⁻¹Xᵀy is unbiased — E(β̂) = β — under just three assumptions: zero-mean errors E(ε) = 0, constant variance var(ε) = σ², and errors uncorrelated with x. You only add ε ~ N(0, σ²) when you want inference — standard errors, t-tests and confidence intervals. Normality for estimation is a classic exam trap.

What does the slope coefficient actually mean?

β̂₁ is the average change in y associated with a one-unit increase in x, holding other predictors fixed. Phrase it as 'associated with', not 'causes' — regression on observational data shows association, not causation, and the causal wording loses marks.

How do I diagnose a residual plot in the exam?

Plot residuals against the fitted values and look for structure. A flat, formless band of constant spread = a good fit. A curve or U-shape means the zero-conditional-mean assumption E(ε|x) = 0 has failed (a missed nonlinearity — add an x² term). A funnel / fan-out means the constant-variance assumption var(ε|x) = σ² has failed (heteroscedasticity — transform y, e.g. log y). Name the specific assumption, not just 'the model is bad'.

Why isn't a high R² enough to choose a model?

Because R² can only increase (or stay flat) when you add a predictor, even a completely useless one. So a bigger model almost always has a higher R² without being better. That is why the unit introduces adjusted R², which penalises each extra term and can fall — and motivates the bias–variance model-selection ideas in a later chapter.

Study strategy

Exam move

Treat Week 6 as the quantitative core of both exams and drill it for speed by hand. Get matrix multiplication automatic — check inner dimensions, then row-dot-column — because the 1-mark MCQ and every regression quantity (XᵀX, Xᵀy, Xβ̂) depend on it. Memorise the OLS pipeline as a fixed routine: β̂₁ = Sₓᵧ/Sₓₓ, β̂₀ = ȳ − β̂₁x̄, then verify a fitted point, then R² = 1 − RSS/TSS. Practise interpreting a slope as an associated (not causal) average change, and rehearse the residual-plot short-answer until you can instantly map curvature → zero-mean failure and funnel → constant-variance failure with a one-line remedy. Finally, keep the estimation-vs-inference distinction sharp (Normality is only for inference) and be ready to write the equivalent NumPy/statsmodels code from memory.

A+Everything unlocked
Unlocks this Bible + all 203 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.
Sia - your BUSS6002 tutor, unlimited, worked the way the exam marks it
The full 9-page Bible + practice bank with worked solutions
Chrome extension - sync your LMS so Sia knows your deadlines
Bilingual EN / Chinese on every Bible and every Sia answer
$25/ month
30-day money-back · cancel in one tap · how it works
Unlock the full BUSS6002 Bible + 203 University of Sydney subjects解锁完整 BUSS6002 Bible + University of Sydney 203 门科目
$25/mo