COMP20008 · Elements Of Data Processing
Hierarchical Clustering & Dendrograms
Unlike k-means, agglomerative hierarchical clustering needs no pre-set k: it repeatedly merges the two closest clusters, building a nested tree — a dendrogram whose y-axis is the merge distance — that you can "cut" at any level to get any number of clusters. The key choice is the linkage (single, complete, average), which decides how the distance between two clusters is measured and changes the merge heights. It is examined as by-hand construction: the 2025 exam's Q5 builds both single- and complete-linkage dendrograms for a small set of 1-D points and reads off the 2-cluster cut.
What this chapter covers
- 011. Agglomerative clustering: start with each point as its own cluster, merge the two closest, repeat
- 022. The dendrogram: a nested tree whose y-axis is the distance at which clusters merge
- 033. Cutting the tree: a horizontal cut at a given height yields a chosen number of clusters
- 044. Single linkage: distance between clusters = minimum pairwise distance (chaining)
- 055. Complete linkage: distance between clusters = maximum pairwise distance (compact clusters)
- 066. Average linkage: distance between clusters = mean pairwise distance
- 077. NMI for comparing two clusterings: high NMI = the two labellings agree
Single vs complete linkage dendrograms for 1-D points (mirrors 2025 Q5)
- 1 markCompute all pairwise distances (absolute differences): |A−B| = 2, |B−C| = 1, |C−D| = 4, |A−C| = 3, |B−D| = 5, |A−D| = 7.
- 1 markSingle linkage (merge by minimum inter-cluster distance): merge B,C at 1; then {B,C} joins A at min(2, 3) = 2; then {A,B,C} joins D at the minimum distance to D, which is 4. Merge heights: 1, 2, 4.
- 1 markComplete linkage (merge by maximum inter-cluster distance): merge B,C at 1; for {B,C} vs A the linkage is max(2, 3) = 3 and vs D is max(5, 4) = 5, so A (3) merges first → {A,B,C}; then {A,B,C} joins D at max distance 7. Merge heights: 1, 3, 7.
- 1 markRecord the dendrogram heights: single linkage merges at 1, 2, 4; complete linkage merges at 1, 3, 7.
- 1 mark2-cluster cut (cut just below the last/highest merge): both methods give the two clusters {A, B, C} and {D}.
Key terms
- Agglomerative (hierarchical) clustering
- A bottom-up method: start with every point as its own cluster, then repeatedly merge the two closest clusters until one remains. It produces a full hierarchy (dendrogram) without needing k chosen in advance.
- Dendrogram
- The tree that records the sequence of merges, with the y-axis showing the distance at which each merge happened. Cutting it with a horizontal line at a chosen height splits the data into that many clusters.
- Single linkage
- Defines the distance between two clusters as the minimum distance between any pair of their points. It tends to chain points into long, straggly clusters because only the nearest pair matters.
- Complete linkage
- Defines the distance between two clusters as the maximum distance between any pair of their points. It favours compact, roughly equal-diameter clusters because the farthest pair must be small to merge.
- Average linkage
- Defines the distance between two clusters as the mean of all pairwise distances between their points — a compromise between single (min) and complete (max) linkage.
- Normalised mutual information (NMI) for clusterings
- Treats each clustering as a categorical labelling of the points and computes NMI between the two labellings; high NMI (near 1) means the two clusterings agree, low NMI means they disagree. Used to compare clustering results.
Hierarchical Clustering & Dendrograms FAQ
How is hierarchical clustering different from k-means?
k-means needs you to fix k up front, randomly initialises centroids, and partitions the data into exactly k flat clusters by minimising SSE. Agglomerative hierarchical clustering needs no k in advance: it merges the closest clusters step by step into a full tree (dendrogram), and you choose the number of clusters afterwards by cutting the tree at a height. Hierarchical also gives a nested structure showing how groups relate, which k-means does not.
What is the difference between single and complete linkage?
Single linkage measures the distance between two clusters as the minimum distance between any pair of their points, so it merges whenever any two points are close — this "chains" points and can create long, straggly clusters. Complete linkage uses the maximum pairwise distance, so two clusters merge only when even their farthest points are close, producing compact, tighter clusters. They can give the same final groups but different merge heights, as in the worked example (1,2,4 vs 1,3,7).
How do I read the number of clusters off a dendrogram?
Draw a horizontal line at a chosen height and count how many vertical branches it crosses — that is the number of clusters at that level. To get exactly k clusters, cut just below the merge that would otherwise reduce the count to k−1. The y-axis height tells you the distance at which clusters joined, so a cut low on the tree gives many tight clusters and a high cut gives few broad ones.
How is this examined in COMP20008?
As a by-hand construction, mirroring 2025 Q5: you are given a small set of (often 1-D) points and asked to build single- and complete-linkage dendrograms, list the merge heights, and state the cut for a given number of clusters. The marks reward computing the pairwise distances correctly, applying the min (single) or max (complete) rule at each merge, and reading the cut — plus knowing what each linkage does to cluster shape.
Exam move
Practise the merge routine until it is mechanical: compute all pairwise distances, find the smallest, merge that pair, then recompute cluster-to-cluster distances using the min rule (single) or max rule (complete), and repeat — recording each merge height as you go. Keep the two linkage behaviours in mind as a one-liner (single = nearest pair, chains; complete = farthest pair, compact) because the "explain the difference" mark is common. Be precise about the dendrogram axes (y = merge distance) and how to cut for k clusters. Note that single and complete can produce the same final clusters but different heights, which the exam likes to test. Finally, link this to NMI: you can compare two clusterings by treating them as labellings and computing NMI.