Proof of Theorem 3

We begin by proving lemma 22, that the revised version of ``instantiate'', given in section 6.3, gives an instantiation of a cluster tree in Euclidean space.

**Lemma 22:** Any cluster tree *T* has an instantiation in Euclidean
space *R*^{m} of any dimensionality *m*.

The proof is essentially the same as the
proof of Lemma 2,
except that we now have to keep track of
real quantities. For any node *N*, if *i*=*N*.label, we define
*d*(*N*) = *d*_{i}. The proof then proceeds in the following
steps:

- i.
- For any
*i*<*j*,*d*_{i}<*d*_{j}/*a*^{j-i}. Immediate by construction. - ii.
- For any nodes
*M*,*C*, if*M*is a descendant of*C in T*then dist(*G[M]*,*G[C]*) <*a n d*(*C*) / (*a*- 1).**Proof:**Let*C*=*C*_{0},*C*_{1}...*C*_{r}=*M*be the path from*C*to*M*through*T*. Then

dist(*G[M]*,*G[C]*) <= (by the triangle inequality)

the sum over*i*from 0 to*r*-1 of dist(*G[C*_{i+1}*]*,*G[C*_{i}*]*) <=

the sum over*i*from 0 to*r*-1 of (*n d*(*C*) /*a*^{i}) < (*a*/(*a*-1)) (*n d*(*C*)). - iii.
- Let
*N*be a node in*T*; let*C*_{1}and*C*_{2}be two children of*N*; and let*M*_{1}and*M*_{2}be descendants of*C*_{1}and*C*_{2}respectively. Then*d*(*N*) (1 - 2*n*/(*a*-1)) < dist(*G[M*_{1}*]*,*G[M*_{2}*]*) <*nd*(*N*) (1 + 2/(*a*-1))

**Proof:**By the triangle inequality,

dist(*G[C*_{1}*]*,*G[C*_{2}*]*) <=dist(*G[C*_{1}*]*,*G[M*_{1}*]*) + dist(*G[M*_{1}*]*,*G[M*_{2}*]*) + dist(*G[M*_{2}*]*,*G[C*_{2}*]*).

Thus, dist(*G[C*_{1}*]*,*G[C*_{2}*]*) -dist(*G[C*_{1}*]*,*G[M*_{1}*]*) -dist(*G[M*_{2}*]*,*G[C*_{2}*]*) <=dist(*G[M*_{1}*]*,*G[M*_{2}*]*).

Also, by the triangle inequality,

dist(*G[M*_{1}*]*,*G[M*_{2}*]*) <=dist(*G[C*_{1}*]*,*G[C*_{2}*]*) + dist(*G[C*_{1}*]*,*G[M*_{1}*]*) + dist(*G[M*_{2}*]*,*G[C*_{2}*]*).

By construction,*d*(*N*) <= dist(*G[C*_{1}*]*,*G[C*_{2}*]*) <*n d*(*N*),

and by part (ii), for*i*= 1,2, dist(*G[M*_{i}*]*,*G[C*_{i}*]*) <*a n d*(*C*) / (*a*- 1) <*n d*(*N*) / (*a*-1)

as*d*(*C*) <*d*(*N*) /*a*. - iv.
- For any symbols
*a*,*b*,*c*,*d in T*, let*P*be the least common ancestor of*a*,*b*and let*N*be the least common ancestor of*c*,*d*. If*P*.label <*N*.label then much_closer(*G[a]*,*G[b]*,*G[c]*,*G[d]*).**Proof:**By part (iii), dist(*G[a]*,*G[b]*) <*nd*(*P*) (1 + 2/(*a*-1)) and dist(*G[c]*,*G[d]*) >*d*(*N*) (1 - 2*n*/(*a*-1)). Since*d*(*P*) <*d*(*N*) /*a*and since*a*= 2+2*n*+*Bn*, it follows by straightforward algebra that dist(*G[a]*,*G[b]*) < dist(*G[c]*,*G[d]*) /*B*.

We next prove the analogue of lemma 5.

**Lemma 23:** Let *S* be a set of constraints over *n* variables
of the form
``dist(*a*,*b*) < dist(*c*,*d*) / *B*'', where *B* > *n*.
If *S* is consistent, then there is some edge in *S* which
is not in the connected components of the shorts of *S*.

**Proof:** Let *Z* be a valuation satisfying *S*. Let
*pq* be the edge in *S* for which dist(*Z*(*p*),*Z*(*q*)) is maximal.
Now, if *ab* is a short of *S* -- that is, there is a constraint
much_closer(*a*,*b*,*c*,*d*) in *S* -- then dist(
*Z*(*a*), *Z*(*b*))
< dist(
*Z*(*c*), *Z*(*d*))/*B* <= dist(
*Z*(*p*), *Z*(*q*))/*B*.

Now, let *ab* be any edge in *H*, the connected components of the shorts of
*S*. Then there is a simple path
*a*_{1}=*a*, *a*_{2} ... *a*_{k} =*b* such that
the edge
*a*_{i}*a*_{i+1} is a short of *S* for
*i*=1 ... *k*-1.
Note that *k* <= *n*. Then, by the triangle inequality,

dist(HenceZ(a),Z(b)) <=

dist(Z(a_{1}),Z(a_{2})) + dist(Z(a_{2}),Z(a_{3})) + ... + dist(Z(a_{k-1}),Z(a_{k})) <=

(k-1)dist(Z(p),Z(q)) /B< dist(Z(p),Z(q))

**Theorem 3:** Let *S* be a set of constraints over *n* variables
of the form ``dist(*a*,*b*) < dist(*c*,*d*) / *B*'', where *B* > *n*.
The algorithm solve_constraints(*S*) returns
a cluster tree satisfying *S* if *S* is consistent over Euclidean
space, and returns
**false** if *S* is inconsistent.

**Proof:**
Note that the semantics of the constraints ``much_closer(*a*,*b*,*c*,*d*)''
enters into the proof of Theorem 1 only in
lemmas 2 and 5. The remainder
of the proof of Theorem 1 has to do purely
with the relation between the
structure of *S* and the structure of the tree. Hence, since we have
shown that the analogues of lemmas 2 and
5 hold in a set of constraints
of this kind, the same proof can be completed in exactly the same way.