DEPARTMENT OF COMPUTER SCIENCE
DOCTORAL DISSERTATION DEFENSE


Candidate: Ramesh Hariharan
Advisor: Richard Cole

Designing Pattern Matching Algorithms by Exploiting Structural Pattern Properties

5 p.m., Friday, May 13, 1994
12th fl Conference Rm, 719 Broadway

Abstract

This thesis presents algorithms and, in some cases, lower bounds for some fundamental pattern matching problems. In all cases, the algorithms are obtained by understanding and strongly exploiting structural pattern properties. The following results are obtained.


Exact Complexity of String Matching: We consider the question of how many character comparisons are needed to find all occurrences of a pattern string of length m in a text string of length n. We show an almost tight upper bound of the form n+O(n/m) character comparisons, following preprocessing. Specifically, we show an upper bound of n + 8/(3(m+1)) (n-m) character comparisons. The following lower bounds are also shown: for on-line algorithms, a bound of n + 9 / (4(m+1)) (n-m) character comparisons for m=35+36k, for any integer k >= 1, and for general algorithms, a bound of n+2(n-m) / (m+3) character comparisons, for m=2k+1, for any integer k >= 1.

Parallel Two-Dimensional Pattern Matching: We give the first time, space and work optimal common CRCW-PRAM algorithm for finding all occurrences of a two-dimensional pattern of size m1 * m2 in a two-dimensional text of size n1 * n2. Our algorithm runs in O(1) time performing O(n1 * n2) work, following preprocessing of the pattern. A major portion of the preprocessing step is the computation of witnesses for the pattern. We show how to compute witnesses for the pattern in O(log log m2)time and O(m1 * m2) work when m2 >= m1. In the process of designing the above algorithm, we also obtain some new periodicity properties of two-dimensional patterns.

Parallel Suffix Tree Construction: We consider the problem of constructing the suffix tree of a given string s of length m in parallel. An O(m)-work, O(m)-space, O(log4 m)-time CREW-PRAM algorithm for constructing the suffix tree of s is obtained when s is drawn from any fixed alphabet set. This is the first work and space optimal parallel algorithm known for this problem. It can be generalized to construct the suffix tree of a string sdrawn from any general alphabet set to perform in O(log4 m) time, $O(m\log |\Sigma|)$ work, and $O(m\log |\Sigma|)$ space, after the characters in s have been sorted alphabetically; here $|\Sigma|$ is the number of distinct characters in s. In this case too, the algorithm is work optimal.