Proof of Theorem 3

Next: Proof of theorem 4 Up: Appendix A. Proofs Previous: Validation of algorithm solve_constraints2

Proof of Theorem 3

We begin by proving lemma 22, that the revised version of ``instantiate'', given in section 6.3, gives an instantiation of a cluster tree in Euclidean space.

Lemma 22: Any cluster tree T has an instantiation in Euclidean space R^m of any dimensionality m.

The proof is essentially the same as the proof of Lemma 2, except that we now have to keep track of real quantities. For any node N, if i=N.label, we define d(N) = d_i. The proof then proceeds in the following steps:

i.

For any i< j, d_i < d_j / a^j-i. Immediate by construction.

ii.

For any nodes M,C, if M is a descendant of C in T then dist(G[M],G[C]) < a n d(C) / (a - 1).

Proof: Let C=C₀, C₁ ... C_r=M be the path from C to M through T. Then
dist(G[M],G[C]) <= (by the triangle inequality)
the sum over i from 0 to r-1 of dist(G[C_i+1],G[C_i]) <=
the sum over i from 0 to r-1 of (n d(C) / aⁱ) < (a/(a-1)) (n d(C)).

iii.

Let N be a node in T; let C₁ and C₂ be two children of N; and let M₁ and M₂ be descendants of C₁ and C₂ respectively. Then d(N) (1 - 2n/(a-1)) < dist( G[M₁], G[M₂]) < nd(N) (1 + 2/(a-1))
Proof: By the triangle inequality,
dist( G[C₁], G[C₂]) <=dist( G[C₁],G[M₁]) + dist( G[M₁],G[M₂]) + dist( G[M₂],G[C₂]).
Thus, dist( G[C₁], G[C₂]) -dist( G[C₁],G[M₁]) -dist( G[M₂],G[C₂]) <=dist( G[M₁],G[M₂]).
Also, by the triangle inequality,
dist( G[M₁],G[M₂]) <=dist( G[C₁], G[C₂]) + dist( G[C₁],G[M₁]) + dist( G[M₂],G[C₂]).
By construction, d(N) <= dist( G[C₁],G[C₂]) < n d(N),
and by part (ii), for i = 1,2, dist( G[M_i],G[C_i]) < a n d(C) / (a - 1) < n d(N) / (a -1)
as d(C) < d(N) / a.

iv.

For any symbols a,b,c,d in T, let P be the least common ancestor of a,b and let N be the least common ancestor of c,d. If P.label < N.label then much_closer( G[a],G[b],G[c],G[d]).

Proof: By part (iii), dist(G[a],G[b]) < nd(P) (1 + 2/(a-1)) and dist(G[c],G[d]) > d(N) (1 - 2n/(a -1)). Since d(P) < d(N) / a and since a = 2+2n+Bn, it follows by straightforward algebra that dist(G[a],G[b]) < dist(G[c],G[d]) / B.

We next prove the analogue of lemma 5.

Lemma 23: Let S be a set of constraints over n variables of the form ``dist(a,b) < dist(c,d) / B'', where B > n. If S is consistent, then there is some edge in S which is not in the connected components of the shorts of S.

Proof: Let Z be a valuation satisfying S. Let pq be the edge in S for which dist(Z(p),Z(q)) is maximal. Now, if ab is a short of S -- that is, there is a constraint much_closer(a,b,c,d) in S -- then dist( Z(a), Z(b)) < dist( Z(c), Z(d))/B <= dist( Z(p), Z(q))/B.

Now, let ab be any edge in H, the connected components of the shorts of S. Then there is a simple path a₁=a, a₂ ... a_k =b such that the edge a_ia_i+1 is a short of S for i=1 ... k-1. Note that k <= n. Then, by the triangle inequality,

dist( Z(a), Z(b)) <=
dist( Z(a₁),Z(a₂)) + dist( Z(a₂),Z(a₃)) + ... + dist( Z(a_k-1),Z(a_k)) <=
(k-1)dist( Z(p), Z(q)) / B < dist( Z(p), Z(q))

Hence pq <> ab, so pq is not in H.

Theorem 3: Let S be a set of constraints over n variables of the form ``dist(a,b) < dist(c,d) / B'', where B > n. The algorithm solve_constraints(S) returns a cluster tree satisfying S if S is consistent over Euclidean space, and returns false if S is inconsistent.

Proof: Note that the semantics of the constraints ``much_closer(a,b,c,d)'' enters into the proof of Theorem 1 only in lemmas 2 and 5. The remainder of the proof of Theorem 1 has to do purely with the relation between the structure of S and the structure of the tree. Hence, since we have shown that the analogues of lemmas 2 and 5 hold in a set of constraints of this kind, the same proof can be completed in exactly the same way.

Next: Proof of theorem 4 Up: Appendix A. Proofs Previous: Validation of algorithm solve_constraints2