University of Sydney · FACULTY OF COMPUTER SCIENCE

COMP5318 · Machine Learning and Data Mining

- one subject, every graph, every model, every mark
Computer Science14 Chapters7-page Bible
Our own words - no uploaded lecturer files
Updated for this semester
Chapter 3 of 11 · COMP5318

Linear & Logistic Regression, Regularization

This Week 3 topic of COMP5318 Machine Learning and Data Mining at the University of Sydney builds your first explicit parametric models: linear regression fitted by least squares and scored with , and logistic regression, which sends the same linear score through a sigmoid to predict a class probability. It closes with regularization — the Ridge (L2) and Lasso (L1) penalties that shrink weights to fight overfitting — all of which show up as short numeric problems in the closed-book final.

In this chapter

What this chapter covers

  • 01Fit a straight line ŷ = b₀ + b₁x by least squares using the closed-form slope and intercept formulas
  • 02Understand why least squares minimises SSE (squared residuals), not the raw sum of residuals
  • 03Split the spread of y with the identity SST = SSR + SSE and read each sum
  • 04Compute R² = SSR/SST = 1 − SSE/SST and interpret it as the fraction of variation explained
  • 05Recover the correlation r = ±√R², taking the sign from the slope direction
  • 06See how logistic regression turns the linear score into a probability with the sigmoid and the log-odds link
  • 07Apply the p ≥ 0.5 decision rule and find the decision boundary at z = 0
  • 08Know logistic regression is fitted by maximum likelihood, not least squares
  • 09Contrast Ridge (L2) shrinkage with Lasso (L1) feature selection and the role of the penalty strength α
Worked example · free

Fit a least-squares line, predict, and compute R²

Q [6 marks]. Over four weeks a tutor records a student's revision hours x and a diagnostic score y: (1, 1), (2, 3), (3, 2), (4, 4). Fit ŷ = b₀ + b₁x by least squares, predict the score at x = 5 hours, and compute R².
  • +1Sums (n = 4): Σx = 10, Σy = 10, so x̄ = 2.5, ȳ = 2.5. Σxy = 1 + 6 + 6 + 16 = 29. Σx² = 1 + 4 + 9 + 16 = 30.
  • +1Slope: b₁ = [Σxy − (Σx)(Σy)/n] / [Σx² − (Σx)²/n] = [29 − (10)(10)/4] / [30 − 100/4] = (29 − 25)/(30 − 25) = 4/5 = 0.8.
  • +1Intercept: b₀ = ȳ − b₁x̄ = 2.5 − 0.8·2.5 = 2.5 − 2.0 = 0.5. Fitted line: ŷ = 0.5 + 0.8x.
  • +1Predict x = 5: ŷ = 0.5 + 0.8·5 = 0.5 + 4.0 = 4.5.
  • +1SSE: fitted ŷ = 1.3, 2.1, 2.9, 3.7; residuals −0.3, 0.9, −0.9, 0.3; squares 0.09, 0.81, 0.81, 0.09 → SSE = 1.80.
  • +1SST and R²: deviations of y from ȳ = 2.5 are −1.5, 0.5, −0.5, 1.5; squares 2.25, 0.25, 0.25, 2.25 → SST = 5.00. R² = 1 − 1.80/5.00 = 1 − 0.36 = 0.64.
The least-squares line is ŷ = 0.5 + 0.8x; the predicted score at 5 hours is 4.5, and R² = 0.64, so the line explains 64% of the spread in scores. Check: SSR = SST − SSE = 5.00 − 1.80 = 3.20, and SSR/SST = 3.20/5.00 = 0.64. The slope is positive, so the correlation r = +√0.64 = +0.8.
Sia tip — Set out columns for x, y, xy, x², then ŷ, e, e² — each of Σxy, Σx², SSE and SST becomes a single addition, and the marker can award the method marks even if one cell slips. Write each formula, substitute, then evaluate; a bare final R² with no working earns far less than a clearly substituted answer.
Glossary

Key terms

Least squares
The criterion that chooses the line's slope and intercept to minimise the sum of squared residuals (SSE); it gives the closed-form b₁ and b₀ formulas.
Residual
The leftover error for one point, eᵢ = yᵢ − ŷᵢ — the vertical gap between the observed value and the fitted line.
SSE / SST / SSR
SSE = Σ(y−ŷ)² (unexplained), SST = Σ(y−ȳ)² (total spread), SSR = Σ(ŷ−ȳ)² (explained). They satisfy SST = SSR + SSE.
R² (coefficient of determination)
R² = SSR/SST = 1 − SSE/SST, the fraction of the spread in y the model explains; on the fitting data it runs 0 to 1, but can go negative on a held-out set.
Sigmoid (logistic function)
σ(z) = 1/(1+e⁻ᶻ), which maps any real score z into a probability in (0, 1); it equals 0.5 exactly at z = 0.
Log-odds (logit)
ln(p/(1−p)) = b₀ + b₁x — the log of the odds, which logistic regression models as a linear function of x even though p itself is not linear.
Ridge (L2)
Regularization that adds α·Σwⱼ² to the error; it shrinks all weights smoothly toward 0 but never sets any exactly to 0.
Lasso (L1)
Regularization that adds α·Σ|wⱼ| to the error; it can drive some weights exactly to 0, performing automatic feature selection. α (or λ) sets the penalty strength.
FAQ

Linear & Logistic Regression, Regularization FAQ

Is linear regression fitted the same way as logistic regression?

No — and the exam tests the difference. Linear regression fits a numeric-target line by least squares (minimising SSE), which has a closed form. Logistic regression is a two-class classifier: it runs the linear score through a sigmoid and chooses the weights that make the observed 0/1 labels most probable — maximum likelihood, solved iteratively. Writing 'logistic regression minimises SSE' is a common false statement.

What is the difference between Ridge and Lasso, and when does R² go negative?

Ridge (L2) adds α·Σwⱼ² and shrinks weights smoothly toward 0 without ever zeroing them; Lasso (L1) adds α·Σ|wⱼ| and can set some weights exactly to 0, so it selects features. A larger α means a heavier penalty and a simpler model. R² sits in [0, 1] only on the data the model was fitted to; scored on a different test set, a poorly-fitting model can predict worse than the mean and give a negative R².

Can AI help me with linear and logistic regression in COMP5318?

Yes, for understanding. Sia can explain each step — how the least-squares formulas are derived, why the sigmoid outputs a probability, or how Ridge and Lasso differ — and walk you through a practice problem with your own numbers so you learn the method. Use it to check your reasoning and rehearse the derivations, not to obtain answers to submitted assignments or the closed-book exam; the unit requires you to acknowledge any AI tools used in assessable work, so keep AI to learning and revision.

Studying with AI? Sia — free AI machine learning tutor works through COMP5318 step by step.

Study strategy

Exam move

Treat this topic as two formulas-plus-a-penalty. First drill the least-squares pipeline until it is automatic: lay out columns for x, y, xy and x², compute the slope and intercept, then the residuals, SSE, SST and R² — practise on small four- or five-point datasets so a full fit-and-score takes only a few minutes. Second, lock the logistic side: the sigmoid formula, the log-odds link, the p ≥ 0.5 rule (boundary at z = 0), and the fact that it is fitted by maximum likelihood, not least squares. Third, memorise the one contrast the exam always frames — Ridge (L2) shrinks, Lasso (L1) selects — and what the strength α trades. The final is 2 hours, closed book, with a non-programmable calculator only; regression appears as a short numeric problem, so budget roughly one minute per mark (a 4-mark R² part is about 4 minutes) and always show the formula before the number. Keep AI tutoring to rehearsing the method, and confirm the exact exam date on the Canvas exam timetable.

A+Everything unlocked
Unlocks this Bible + all 25 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.
Sia - your COMP5318 tutor, unlimited, worked the way the exam marks it
The full 7-page Bible + practice bank with worked solutions
Chrome extension - sync your LMS so Sia knows your deadlines
Bilingual EN / Chinese on every Bible and every Sia answer
$25/ month
30-day money-back · cancel in one tap · how it works
Unlock the full COMP5318 Bible + 25 University of Sydney subjects解锁完整 COMP5318 Bible + University of Sydney 25 门科目
$25/mo