Title: A Unified Construction of the Glushkov, Follow, and Antimirov Automata


(NYU-CS-TR880)

Authors: Cyril Allauzen and Mehryar Mohri

Abstract:
Many techniques have been introduced in the last few decades to 
create ε-free automata representing regular expressions: Glushkov automata, 
the so-called follow automata, and Antimirov automata. This paper presents 
a simple and unified view of all these ε-free automata both in the case of 
unweighted and weighted regular expressions.It describes simple and general 
algorithms with running time complexities at least as good as that of the 
best previously known techniques, and provides concise proofs.The construction 
methods are all based on two standard automata algorithms: epsilon-removal and 
minimization. This contrasts with the multitude of complicated and special-purpose 
techniques and proofs put forward by others to construct these automata. Our analysis 
provides a better understanding of ε-free automata representing regular 
expressions: they are all the results of the application of some combinations of 
epsilon-removal and minimization to the classical Thompson automata. This makes it 
straight forward to generalize these algorithms to the weighted case, which also 
results in much simpler algorithms than existing ones. For weighted regular 
expressions over a closed semiring, we extend the notion of follow automata to the 
weighted case. We also present the first algorithm to compute the Antimirov automata 
in the weighted case.