BUSS6002 · Data Science In Business
Clustering & Customer Segmentation
Clustering is the unit's main unsupervised method: with no labels to predict, it partitions n data points into k groups that are similar within and different between. Its headline business use is customer segmentation — splitting a market into groups you can serve differently — and its headline algorithm is K-means, which alternates an assignment step and a centroid update step to drive down the within-cluster sum of squares (WCSS). The two facts the exam tests hardest are that the objective squares the Euclidean distance, and that K-means converges only to a local optimum because k is fixed and the result depends on the starting centres. This is the Week 5 topic, examined by MCQ (compute a centroid or squared distance), short answer (write the objective, name the steps, argue convergence) and framework recall (segmentation bases and CLV).
What this chapter covers
- 011. Supervised vs unsupervised — clustering learns structure with no labels
- 022. Customer segmentation — four bases: demographic, geographic, psychographic, behavioural
- 033. Surrounding metrics — market size & value, market share, customer lifetime value (CLV)
- 044. Squared Euclidean distance — the only distance K-means uses, and why it skips the root
- 055. Centroid — the per-coordinate mean of a cluster's members (not a fixed data point)
- 066. Clustering objective (WCSS) — sum of squared point-to-centroid distances, minimised
- 077. The K-means algorithm — init → assign → update → repeat until centres stop moving
- 088. Convergence, local optima & choosing k — monotone non-increasing, elbow method
One K-means iteration — assign, update & new WCSS (k = 2)
- +1Assignment — squared distance of each point to each centre, then assign to the nearest. A: 0 vs 1 → C₁. B: 1 vs 0 → C₂. C: 41 vs 32 → C₂. D: 52 vs 41 → C₂. So C₁={A}, C₂={B,C,D}.
- +1Update μ₁ = mean of {A} = [1,1] (one member, unchanged).
- +1Update μ₂ = mean of {B,C,D} = [(2+6+7)/3, (1+5+5)/3] = [5, 3.67].
- +1The new WCSS (0 from C₁ plus ≈24.7 from C₂) is lower than the start — both steps can only reduce or hold the objective, so it is non-increasing and the algorithm converges. (Next iteration would flip B into C₁.)
Key terms
- Unsupervised learning
- Learning structure from data that has no labels (no target y); clustering is the main example.
- Cluster
- A group of points that are close to each other and far from points in other groups.
- Centroid
- The centre of a cluster, computed as the per-coordinate mean of its members; it need not be an actual data point.
- Squared Euclidean distance
- ‖x − μ‖² = the sum of squared coordinate differences = (x − μ)ᵀ(x − μ); the closeness measure K-means uses.
- Within-cluster sum of squares (WCSS)
- The clustering objective f = Σᵢ Σ_{x∈Cᵢ} ‖x − μᵢ‖²; lower means tighter clusters.
- K-means
- An iterative algorithm that minimises WCSS by alternating an assignment step (nearest centroid) and an update step (recompute centroids as means).
- Customer segmentation
- Splitting a market into groups that are similar within and different between, so marketing can be tailored beyond the 'average customer'.
- Customer lifetime value (CLV)
- The present value of a customer's expected future net cash flows (revenue minus acquisition and retention costs).
Clustering & Customer Segmentation FAQ
Is clustering supervised or unsupervised?
Unsupervised — there is no label y to predict. K-means learns the grouping purely from the feature values, judged by how compact the clusters are, not by matching a known answer.
Why does K-means use the SQUARED distance instead of the plain distance?
Squaring removes the square root, which is faster, and it makes the update step clean: the point that minimises the sum of squared distances to a set of points is exactly their mean — which is why the centroid is the mean. For deciding the nearest centre the two give the same ranking anyway.
Does K-means always find the best possible clustering?
No. The objective can only fall or stay equal each iteration, so K-means always converges — but only to a LOCAL optimum that depends on the starting centres. The standard fix is to run it several times from different random seeds and keep the lowest-WCSS result.
Does K-means choose the number of clusters k for me?
No — k is an INPUT you fix in advance. K-means never discovers the right k. To choose k, plot WCSS against k and look for the 'elbow' (the point of diminishing returns), or use business judgement about how many segments are actionable.
Is the centroid always one of the data points?
No. The centroid is the average of its members, so it usually sits between them and is not an actual observation. It is also recomputed every iteration as the membership changes.
Does the objective strictly decrease every iteration?
It is non-increasing — it falls or stays the same. At convergence it holds steady. Claiming it 'always strictly decreases' is the subtly-wrong statement exams like to test.
Exam move
Lock in the two formulas first — centroid = per-coordinate mean, and WCSS = Σ Σ ‖x − μ‖² — because the easy MCQ marks (compute a centroid, compute a squared distance) are pure arithmetic once you know them, and the squared norm is where careless students drop marks. Then drill one full K-means iteration by hand: for each point compute the squared distance to every centre, assign to the smallest, recompute each centre as the mean of its new members, and show the WCSS fell. For the short-answer question be able to write the objective, name the assignment and update steps, and argue convergence precisely — the objective is non-increasing and converges to a LOCAL (not global) optimum because k is fixed and initialisation matters. Finally, keep the marketing framework warm: the four segmentation bases (demographic, geographic, psychographic, behavioural) and the surrounding metrics (market size, value, share, CLV), since these are quick recall marks students forget while drowning in the linear algebra.