Any decompositional clustering algorithm can be made hierarchical by recursive application.
K-means-cluster (in S : set of vectors : k : integer) { let C[1] ... C[k] be a random partition of S into k parts; repeat { for i := 1 to k { X[i] := centroid of C[i]; C[i] := empty } for j := 1 to N { X[q] := the closest to S[j] of X[1] ... X[k] add S[j] to C[q] } } until the change to C (or the change to X) is small enough. }
Example: (Note: I have started here with random points rather than a random partition.)
Features:
Sum of the squared distance steadily decreases over the iterations.
Hill-climbing algorithm.
Problems:
Variant 1: Update centroids incrementally. Claims to give better results.
Variant 2: Bisecting K-means algorithm:
for I := 1 to K-1 do { pick a leaf cluster C to split; for J := 1 to ITER do split C into two subclusters C1 and C2; choose the best of the above splits and make it permanent }Generates a binary clustering hierarchy. Works better (so claimed) than K-means.
Variant 3: Allow overlapping clusters: If distances from P to C1 and to C2 are close enough then put P in both clusters.
Full mean vector would get very long (= number of different words in all the documents.) Solution: truncate after first M terms. (Typically M=50 or 100.)
{ put every point in a cluster by itself for I := 1 to N-1 do { let C1,C2 be the most mergeable pair of clusters; create C parent of C1, C2 } }Various measures of "mergeable" used.
Characteristics
pick a starting point D in S; CC = { { D } } } /* Set of clusters: Initially 1 cluster containing D */ for Di in S do { C := the cluster in CC "closest" to Di if similarity(C,Di) > threshhold then add Di to C; else add { Di } to CC; }Features: