Proof of Theorem 1

**Lemma 3:** Let *T* be a cluster tree and let *Z* be an instantiation of *T*.
Let *a* and *b* be symbols of *T*. Let *N* be the least common ancestor
of *a* and *b in T*. Then od(
*Z*(*a*), *Z*(*b*)) =
odiam(*Z*(*N*)).

**Proof:** Since *Z*(*a*) and *Z*(*b*) are elements of
*Z*(*N*), it follows from the definition of odiam that
od(
*Z*(*a*), *Z*(*b*)) <<= odiam(*Z*(*N*)).
Suppose the inequality were strict; that is,
od(
*Z*(*a*), *Z*(*b*)) << odiam(*Z*(*N*)).
Then let *C* be the set of all the symbols *c* of *T* such that od(*Z*(*a*),*Z*(*c*)) <<= od(*Z*(*a*),*Z*(*b*)).
Then odiam(*Z*(*C*)) = od(*Z*(*a*),*Z*(*b*)) << odiam(*Z*(*N*)).
It is easily shown that *Z*(*C*) is a cluster in
*Z*(*T*). Therefore, by property (ii) of definition 5,
there must be a node *M* such that *M*.symbols = *C*. Now, *M* is certainly
not an ancestor of *N*, since odiam(*Z*(*M*)) << odiam(*Z*(*N*))
but *M*.symbols contains both *a* and *b*. But this contradicts the
assumption that *N* was the least common ancestor of *a* and *b*.

**Corollary 4:** Let *T* be a cluster tree and let *Z* be an instantiation of *T*.
Let *a*,*b*,*c*,*d* be symbols of *T*.
Let *N* be the least common ancestor of *c* and *d in T*, and
let *M* be the least common ancestor of *a* and *b in T*.
Then od(
*Z*(*a*), *Z*(*b*)) << od(
*Z*(*c*), *Z*(*d*)) if and
only if *M*.label < *N*.label.

**Proof:** Immediate from lemma 3 and property (iii) of
definition 5 of instantiation.

**Lemma 5:** Let *S* be any set of constraints of the form
od(*a*,*b*) << od(*c*,*d*). Let *H* be the connected components of
the shorts of *S*. If *S* is consistent, then not every
edge of *S* is in *H*.

**Proof:** Let *Z* be a valuation satisfying *S*. Find
an edge *pq in S* for which od(*Z*(*p*),*Z*(*q*)) is maximal.
Now, if *ab* is a short of *S* -- that is, there is a constraint
od(*a*,*b*) << od(*c*,*d*) in *S* -- then od(
*Z*(*a*), *Z*(*b*))
<< od(
*Z*(*c*), *Z*(*d*)) <<= od(
*Z*(*p*), *Z*(*q*)).

Now, let *ab* be any edge in *H*, the connected components of the shorts of
*S*. Then there is a path
*a*_{1}=*a*, *a*_{2} ... *a*_{k} =*b* such that
the edge
*a*_{i}*a*_{i+1} is a short of *S* for
*i*=1 ... *k*-1.
Thus, by the om-triangle inequality,
od(
*Z*(*a*), *Z*(*b*)) <<=
*max*_{i=1..k-1}(od
(*Z*(*a*_{i}), *Z*(*a*_{i+1}))) <<od(
*Z*(*p*), *Z*(*q*)). Hence *pq* <> *ab*, so *pq* is not in *H*.

**Lemma 6:** The values of *S* and *H* in any iteration are supersets
of their values in any later iteration.

**Proof:** *S* is reset to a subset of itself at the end of each
iteration. *H* is defined in terms of *S* in a monotonic manner.

**Lemma 7:**
*S* cannot be the same in two successive iterations of the main loop.

**Proof:** by contradiction. Suppose that *S* is the same in two
successive iterations. Then *H* will be the same, since it is defined
in terms of *S*. *H* is constructed to contain all the shorts of *S*,
Since the resetting of *S* at the end of the first iteration does
not change *S*, *H* must contain all the longs as well. Thus,
*H* contains all the edges in *S*. But that being the case, the
algorithm should have terminated with failure at the beginning of
the first iteration.

**Lemma 8:** Algorithm solve_constraints always terminates.

**Proof:** By lemma 7, if the algorithm does not exit
with failure, then
on each iteration some constraints are removed from *S*. Hence, the
number of iterations of the main loop is at most the original size of
*S*. Everything else in the algorithm is clearly bounded. (Note
that this bound on the number of iterations is improved in section
6.1 to *n*-1, where *n* is the number of symbols.)

**Lemma 9:** If algorithm solve_constraints returns **false**, then
*S* is inconsistent.

**Proof:** If the algorithm returns **false**, then the transitive
closure of the shorts of *S* contains all the edges in *S*.
By lemma 5, *S* is inconsistent.

**Lemma 10:** If constraint *C* of form od(*a*,*b*) << od(*c*,*d*) is in the
initial value of *S*, and edge *cd* is in *H* in
some particular iteration, then constraint *C* is in *S* at the
start of that iteration.

**Proof:** Suppose that *C* is deleted from *S* on some particular
iteration. Then edge *cd*, the long of *C*, cannot be in
*H* in that iteration. That is, it is not possible for edge *cd* to
persist in *H* in an iteration after *C* has been deleted from *S*.
Note that, by lemma 6, once *cd* is eliminated from
*H*, it remains out of *H*.

**Lemma 11:** The following loop invariant holds:
At the end of
each loop iteration, the values of *L*.symbols, where *L* is a leaf in the
current state of the tree, are exactly the connected components of *H*.

**Proof:** In the first iteration, *T* is initially just the root *R*,
containing all the symbols, and a child of *R* is created for each connected
component of *H*.

Let *T*_{i} and *H*_{i} be the values of *T* and *H* at the end of the *i*th
iteration.
Suppose that the invariant holds at the end of the *k*th iteration.
By lemma 6, *H*_{k+1} is a
subset of *H*_{k}. Hence, each connected component of *H*_{k+1} is a
subset of a connected component of *H*_{k}. Moreover, each connected component
*J* of *H*_{k} is either a connected component of *H*_{k+1} or is partitioned
into several connected components of *H*_{k+1}. In the former case,
the leaf of *T*_{k} corresponding to *J* is unchanged and remains a leaf in
*T*_{k+1}. In the latter case, the leaf corresponding to *J* gets assigned
one child for each connected component of *H*_{k+1} that is a subset of *J*.
Thus, the connected components of *H*_{k+1} correspond to the leaves of
*T*_{k+1}.

**Lemma 12:** If procedure solve_constraints does not return **false**,
then it returns a well-formed cluster tree *T*.

**Proof:**
Using lemma 11, and the cleanup section of
solve_constraints which creates
the final leaves for symbols, it follows that every symbol in *S* ends up in a single leaf of *T*. As *m* is decremented on each iteration,
and as no iteration adds both a new node and children of that node, it
follows that the label of each internal node is less than the label
of its father. Hence the constraints on cluster trees
(definition 3) are
satisfied.

**Lemma 13:** Let *a*,*b* be two distinct symbols in *S* and let *T* be the cluster tree
returned by solve_constraints for *S*. Let *N* be the least
common ancestor of *a*,*b in T*. Then either *N* is assigned its label
on the first iteration when the edge *ab* is not in *H*, or the
edge *ab* is in the final value of *H* when the loop is exited and
*N* is assigned its label in the final cleanup section.

**Proof:** As above, let *H*_{i} be the value of *H* in the *i*th
iteration.

If *N* is the root, then it is assigned its label in the first
iteration. Clearly, *a* and *b*, being in different subtrees of *N*,
must be in different connected components of *H*_{1}.

Suppose *N* is assigned its label in the *k*th iteration of the loop for
*k*>1.
By lemma 11, at the end of the previous iteration,
*N*.symbols was a connected component of *H*_{k-1}, and it
therefore contained the edge *ab*. Since *N* is the least common ancestor
of *a*,*b*, it follows that *a* and *b* are placed in two different children
of *N*; hence, they are in two different connected components of *H*_{k}.
Thus the edge *ab* cannot be in *H*_{k}.

Suppose *N* is assigned its label in the cleanup section of the algorithm.
Then by lemma 11, *N*.symbols is a connected component
of the final value of *H*. Hence the
edge *ab* was in the final value of *H*.

**Lemma 14:** Let *S* initially contain constraint *C* of form
od(*a*,*b*) << od(*c*,*d*).
Suppose that solve_constraints(*S*) returns a
cluster tree *T*. Let *M* be the least common ancestor of *a*,*b in T* and
let *N* be the least common ancestor of *c*,*d*. Then *M*.label < *N*.label.

**Proof:** Suppose *N* is given a label in a given iteration. By
lemma 13, *cd* is eliminated from *H* in that same
iteration. By lemma 10,
constraint *C* must be in *S* at the start of the
iteration. Hence *ab* is a short of *S* in the iteration, and
is therefore in *H*. Hence
*M* is not given a label until a later iteration, and therefore is
given a lower label.

It is easily seen that *cd* cannot be in *H* in the final iteration of
the loop, and hence *N* is not assigned its label in the cleanup
section.

**Lemma 15:** Suppose that solve_constraints(*S*) returns a
cluster tree *T*. Then any instantiation of *T* satisfies the constraints
*S*.

**Proof:** Immediate from lemma 14 and
corollary 4.

**Theorem 1:** The algorithm solve_constraints(*S*) returns
a cluster tree satisfying *S* if *S* is consistent, and returns
**false** if *S* is inconsistent.

**Proof:** If solve_constraints(*S*) returns **false**, then
it is inconsistent (lemma 9). If it does not return
**false**, then
it returns a cluster tree *T* (lemma 12). Since *T* has
an instantiation (lemma 2) and since every
instantiation of *T* is a solution of *S* (lemma 15),
it follows that *S* is consistent and *T* satisfies *S*.