COMP5318 · Machine Learning and Data Mining
Linear & Logistic Regression, Regularization
This Week 3 topic of COMP5318 Machine Learning and Data Mining at the University of Sydney builds your first explicit parametric models: linear regression fitted by least squares and scored with R², and logistic regression, which sends the same linear score through a sigmoid to predict a class probability. It closes with regularization — the Ridge (L2) and Lasso (L1) penalties that shrink weights to fight overfitting — all of which show up as short numeric problems in the closed-book final.
What this chapter covers
- 01Fit a straight line ŷ = b₀ + b₁x by least squares using the closed-form slope and intercept formulas
- 02Understand why least squares minimises SSE (squared residuals), not the raw sum of residuals
- 03Split the spread of y with the identity SST = SSR + SSE and read each sum
- 04Compute R² = SSR/SST = 1 − SSE/SST and interpret it as the fraction of variation explained
- 05Recover the correlation r = ±√R², taking the sign from the slope direction
- 06See how logistic regression turns the linear score into a probability with the sigmoid and the log-odds link
- 07Apply the p ≥ 0.5 decision rule and find the decision boundary at z = 0
- 08Know logistic regression is fitted by maximum likelihood, not least squares
- 09Contrast Ridge (L2) shrinkage with Lasso (L1) feature selection and the role of the penalty strength α
Fit a least-squares line, predict, and compute R²
- +1Sums (n = 4): Σx = 10, Σy = 10, so x̄ = 2.5, ȳ = 2.5. Σxy = 1 + 6 + 6 + 16 = 29. Σx² = 1 + 4 + 9 + 16 = 30.
- +1Slope: b₁ = [Σxy − (Σx)(Σy)/n] / [Σx² − (Σx)²/n] = [29 − (10)(10)/4] / [30 − 100/4] = (29 − 25)/(30 − 25) = 4/5 = 0.8.
- +1Intercept: b₀ = ȳ − b₁x̄ = 2.5 − 0.8·2.5 = 2.5 − 2.0 = 0.5. Fitted line: ŷ = 0.5 + 0.8x.
- +1Predict x = 5: ŷ = 0.5 + 0.8·5 = 0.5 + 4.0 = 4.5.
- +1SSE: fitted ŷ = 1.3, 2.1, 2.9, 3.7; residuals −0.3, 0.9, −0.9, 0.3; squares 0.09, 0.81, 0.81, 0.09 → SSE = 1.80.
- +1SST and R²: deviations of y from ȳ = 2.5 are −1.5, 0.5, −0.5, 1.5; squares 2.25, 0.25, 0.25, 2.25 → SST = 5.00. R² = 1 − 1.80/5.00 = 1 − 0.36 = 0.64.
Key terms
- Least squares
- The criterion that chooses the line's slope and intercept to minimise the sum of squared residuals (SSE); it gives the closed-form b₁ and b₀ formulas.
- Residual
- The leftover error for one point, eᵢ = yᵢ − ŷᵢ — the vertical gap between the observed value and the fitted line.
- SSE / SST / SSR
- SSE = Σ(y−ŷ)² (unexplained), SST = Σ(y−ȳ)² (total spread), SSR = Σ(ŷ−ȳ)² (explained). They satisfy SST = SSR + SSE.
- R² (coefficient of determination)
- R² = SSR/SST = 1 − SSE/SST, the fraction of the spread in y the model explains; on the fitting data it runs 0 to 1, but can go negative on a held-out set.
- Sigmoid (logistic function)
- σ(z) = 1/(1+e⁻ᶻ), which maps any real score z into a probability in (0, 1); it equals 0.5 exactly at z = 0.
- Log-odds (logit)
- ln(p/(1−p)) = b₀ + b₁x — the log of the odds, which logistic regression models as a linear function of x even though p itself is not linear.
- Ridge (L2)
- Regularization that adds α·Σwⱼ² to the error; it shrinks all weights smoothly toward 0 but never sets any exactly to 0.
- Lasso (L1)
- Regularization that adds α·Σ|wⱼ| to the error; it can drive some weights exactly to 0, performing automatic feature selection. α (or λ) sets the penalty strength.
Linear & Logistic Regression, Regularization FAQ
Is linear regression fitted the same way as logistic regression?
No — and the exam tests the difference. Linear regression fits a numeric-target line by least squares (minimising SSE), which has a closed form. Logistic regression is a two-class classifier: it runs the linear score through a sigmoid and chooses the weights that make the observed 0/1 labels most probable — maximum likelihood, solved iteratively. Writing 'logistic regression minimises SSE' is a common false statement.
What is the difference between Ridge and Lasso, and when does R² go negative?
Ridge (L2) adds α·Σwⱼ² and shrinks weights smoothly toward 0 without ever zeroing them; Lasso (L1) adds α·Σ|wⱼ| and can set some weights exactly to 0, so it selects features. A larger α means a heavier penalty and a simpler model. R² sits in [0, 1] only on the data the model was fitted to; scored on a different test set, a poorly-fitting model can predict worse than the mean and give a negative R².
Can AI help me with linear and logistic regression in COMP5318?
Yes, for understanding. Sia can explain each step — how the least-squares formulas are derived, why the sigmoid outputs a probability, or how Ridge and Lasso differ — and walk you through a practice problem with your own numbers so you learn the method. Use it to check your reasoning and rehearse the derivations, not to obtain answers to submitted assignments or the closed-book exam; the unit requires you to acknowledge any AI tools used in assessable work, so keep AI to learning and revision.
Studying with AI? Sia — free AI machine learning tutor works through COMP5318 step by step.
Exam move
Treat this topic as two formulas-plus-a-penalty. First drill the least-squares pipeline until it is automatic: lay out columns for x, y, xy and x², compute the slope and intercept, then the residuals, SSE, SST and R² — practise on small four- or five-point datasets so a full fit-and-score takes only a few minutes. Second, lock the logistic side: the sigmoid formula, the log-odds link, the p ≥ 0.5 rule (boundary at z = 0), and the fact that it is fitted by maximum likelihood, not least squares. Third, memorise the one contrast the exam always frames — Ridge (L2) shrinks, Lasso (L1) selects — and what the strength α trades. The final is 2 hours, closed book, with a non-programmable calculator only; regression appears as a short numeric problem, so budget roughly one minute per mark (a 4-mark R² part is about 4 minutes) and always show the formula before the number. Keep AI tutoring to rehearsing the method, and confirm the exact exam date on the Canvas exam timetable.