The editing distance between trees: Algorithms and applications

Candidate: Zhang,KaiZhong

Abstract

Trees are a ubiquitous building block in computer science and related fields. Examples are grammar parses, image descriptions, secondary structures of RNA molecules, and many other phenomena. Comparing trees is therefore useful to compare scenes, parses, and so on. This thesis presents algorithms for tree comparison and applications of those algorithms. We consider the distance between two labeled trees to be the weighted number of editing operations (insert, delete, and modify) to transform one tree to another. We show that for unordered trees this is a NP-Complete problem. For ordered trees we present a simple fast dynamic programming algorithm that is significantly better than the best previous published algorithms. We then show that our method provides a general technique for solving other related tree problems (e.g. approximate tree matching). We also present efficient parallel algorithms on the assumption that the costs be unit. One of our applications is to compare secondary structures of RNA molecules. We describe another application to vision that uses tree comparisons to compare shapes. We have also implemented some of the algorithms in the form of a tree comparison toolkit. The preliminary version of the toolkit has been used at the U.S. National Cancer Institute for the comparison of RNA secondary structures.