University of Sydney · FACULTY OF COMPUTER SCIENCE

COMP5318 · Machine Learning and Data Mining

- one subject, every graph, every model, every mark

Computer Science14 Chapters8-page Bible

Our own words - no uploaded lecturer files

Updated for this semester

Chapter 6 of 11 · COMP5318

Support Vector Machines, Kernels & PCA

This Week 6 topic of COMP5318 Machine Learning and Data Mining at the University of Sydney covers three linear-algebra workhorses: the support vector machine, which separates two classes with the maximum-margin hyperplane supported by a handful of boundary points; the kernel trick, which lets that linear machine draw curved boundaries by working in a higher-dimensional feature space without ever building it; and PCA, an unsupervised method that finds the orthogonal directions of maximum variance via the SVD to reduce dimensionality and compress data. All three appear as short numeric problems and True/False questions in the closed-book final.

In this chapter

What this chapter covers

01Write a separating hyperplane as w·x + b = 0 and the supporting planes at +1 and −1
02Compute the margin as 2/‖w‖ and see that maximising it means minimising ½‖w‖²
03Solve the support-vector equations to pin down w and b, then identify the support vectors
04Explain how the regularization constant C trades margin width against training-error tolerance
05State why the kernel trick works — K(x,y) = Φ(x)·Φ(y) computes an inner product without building Φ
06Evaluate the polynomial kernel (x·y + 1)^d and know how RBF's γ controls boundary smoothness
07Describe PCA as finding orthogonal principal components ordered by variance, PC1 first
08Link PCA to the SVD X = UΣVᵀ and compute the fraction of variance kept from covariance eigenvalues
09Avoid the PCA True/False traps: it is unsupervised, not always lower-dimensional, always orthogonal, and can compress

Worked example · free

Find the max-margin hyperplane, its margin, and the support vectors

Q [6 marks]. A linearly separable 2-class dataset has positive (+1) points P1(4, 2), P2(4, 4), P3(6, 3) and negative (−1) points N1(2, 2), N2(2, 4), N3(0, 3). The classes are split by a vertical gap. Find the maximum-margin hyperplane w·x + b = 0, its margin, and the support vectors.

+1Locate the gap. The closest positive points sit at x₁ = 4 (P1, P2) and the closest negatives at x₁ = 2 (N1, N2). The boundary is vertical and midway, so x₁ = 3, which means w = (w₁, 0) — only x₁ matters.
+1Write the support-vector equations. Positive support vectors lie on w·x + b = +1: 4w₁ + b = 1. Negative support vectors lie on w·x + b = −1: 2w₁ + b = −1.
+1Solve. Subtract the equations: (4w₁ + b) − (2w₁ + b) = 1 − (−1) → 2w₁ = 2 → w₁ = 1. Back-substitute: 4(1) + b = 1 → b = −3. Boundary: 1·x₁ − 3 = 0, i.e. x₁ = 3.
+1Margin. ‖w‖ = √(1² + 0²) = 1, so the margin d = 2/‖w‖ = 2/1 = 2, which matches the geometric gap between x₁ = 2 and x₁ = 4.
+1Support vectors. The points lying exactly on the ±1 lines: P1(4, 2), P2(4, 4) on the +1 line and N1(2, 2), N2(2, 4) on the −1 line — four support vectors.
+1Check the outliers. P3(6, 3) gives w·x + b = 1(6) − 3 = 3 ≥ 1 and N3(0, 3) gives 1(0) − 3 = −3 ≤ −1, so both are correctly classified and past their margins — they are not support vectors, and removing them leaves the boundary unchanged.

The maximum-margin hyperplane is x₁ = 3 with w = (1, 0) and b = −3; the margin is 2 (= 2/‖w‖); the support vectors are P1(4, 2), P2(4, 4), N1(2, 2), N2(2, 4). P3 and N3 are non-support-vectors. Cross-check: each support vector satisfies yᵢ(w·xᵢ + b) = 1 exactly, e.g. P1: +1·(4 − 3) = 1 and N1: −1·(2 − 3) = 1.

Sia tip — Never guess w. Its scale is fixed only by imposing w·x + b = ±1 at the support vectors, so always solve those two equations for w₁ and b first, then read the margin off 2/‖w‖. Writing w = (1, 0) by eye can give the right direction but the wrong margin. Show the formula, substitute, then evaluate — the method marks are awarded even if one arithmetic step slips.

Glossary

Key terms

Maximum-margin hyperplane: The separating boundary w·x + b = 0 chosen by an SVM to be as far as possible from the nearest point of either class; the wide margin is what gives good generalisation.
Margin: The perpendicular gap between the two supporting hyperplanes w·x + b = +1 and w·x + b = −1, equal to 2/‖w‖. Maximising it is equivalent to minimising ½‖w‖² subject to yᵢ(w·xᵢ + b) ≥ 1.
Support vector: A training point lying exactly on a supporting hyperplane (the ±1 lines). Only support vectors determine the boundary; removing a non-support-vector leaves it unchanged.
Soft margin (C): A relaxation that lets some points sit inside the margin, penalised by the constant C. Large C means few violations and a narrow margin (risk of overfitting); small C means a wider margin with more slack.
Kernel trick: Replacing inner products with a kernel K(x,y) = Φ(x)·Φ(y) so an SVM fits non-linear boundaries in a high-dimensional feature space Φ without ever computing Φ — a curved boundary at the cost of a linear one.
Polynomial / RBF kernel: Two common valid kernels: the polynomial K(x,y) = (x·y + 1)^d gives degree-d boundaries, and the RBF K(x,y) = exp(−γ‖x−y‖²) gives smooth ones — with large γ producing a tight, wiggly (overfitting) boundary and small γ a smooth one.
Principal Component Analysis (PCA): An unsupervised transform that finds orthogonal directions (principal components) of maximum variance; PC1 has the most variance, PC2 the next, all mutually orthogonal. It is used for dimensionality reduction and compression, not classification.
Singular Value Decomposition (SVD): The factorisation X = UΣVᵀ of the centred data matrix. The columns of V are the principal directions and the singular values in Σ measure spread; keeping the largest gives the best low-rank approximation (compression).

FAQ

Support Vector Machines, Kernels & PCA FAQ

How do I tell which points are support vectors, and why do only they matter?

Solve for the boundary first, then test each point against the supporting hyperplanes w·x + b = ±1. A point is a support vector only if it lies exactly on its class's line, i.e. yᵢ(w·xᵢ + b) = 1. Every other correctly classified point sits strictly beyond its margin (yᵢ(w·xᵢ + b) > 1). Only the support vectors touch the margin, so they alone pin the boundary in place — deleting a non-support-vector changes nothing, while deleting a support vector can move the boundary. That is why an SVM is defined by a handful of points, not the whole dataset.

What is the difference between the kernel trick and PCA — they both use linear algebra?

They solve opposite problems. The kernel trick is supervised-classification machinery: it lets an SVM work in a higher-dimensional feature space so classes become separable, computing only inner products K(x,y) = Φ(x)·Φ(y) without building Φ. PCA is unsupervised: it uses no labels and instead finds the orthogonal directions of maximum variance (via the covariance eigenvalues or the SVD X = UΣVᵀ) to reduce dimensions and compress. A frequent exam trap is calling PCA a classifier — it predicts nothing; it only re-expresses the data on variance-ranked axes.

Can AI help me with SVMs, kernels and PCA in COMP5318?

Yes, for learning. Sia can explain each step — why the margin is 2/‖w‖, how solving the support-vector equations fixes w and b, why a kernel equals an inner product in feature space, or how covariance eigenvalues rank the principal components — and walk you through a practice problem with your own numbers so you master the method. Use it to check your reasoning and rehearse derivations, not to obtain answers to submitted assignments or the closed-book exam. The unit requires you to acknowledge any AI tools used in assessable work, so keep AI to revision and understanding.

Studying with AI? Sia — free AI machine learning tutor works through COMP5318 step by step.

Study strategy

Exam move

Treat this topic as one method per sub-part and drill each until it is mechanical. For SVMs, practise the full pipeline on a tiny 2-D set: place the boundary midway in the gap, solve the two support-vector equations w·x + b = ±1 for w and b, read the margin off 2/‖w‖, then list the points that touch the ±1 lines as the support vectors — and remember large C narrows the margin while small C widens it. For kernels, memorise K(x,y) = Φ(x)·Φ(y), be able to evaluate the polynomial kernel (x·y + 1)^d on two vectors, and know that a large RBF γ overfits. For PCA, rehearse centre → covariance matrix → eigenvalues → PC1, and compute the variance kept as the top eigenvalues over the total; lock in the four True/False facts (unsupervised, not always lower-dimensional, always orthogonal, can compress). The final is 2 hours, closed book, with a non-programmable calculator only, so these appear as short numeric problems: budget about one minute per mark (a 6-mark SVM or PCA part is roughly 6 minutes) and always show the formula before the number. Keep AI tutoring to rehearsing the method, and confirm the exact exam date on the Canvas exam timetable.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 25 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.

Sia - your COMP5318 tutor, unlimited, worked the way the exam marks it

The full 8-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works