A Unified Construction of the Glushkov, Follow, and Antimirov Automata
Abstract
Many techniques have been introduced in the last few decades to create epsilon-free automata representing regular expressions: Glushkov automata, the so-called follow automata, and Antimirov automata. This paper presents a simple and unified view of all these epsilon-free automata both in the case of unweighted and weighted regular expressions. It describes simple and general algorithms with running time complexities at least as good as that of the best previously known techniques, and provides concise proofs. The construction methods are all based on two standard automata algorithms: epsilon-removal and minimization. This contrasts with the multitude of complicated and special-purpose techniques and proofs put forward by others to construct these automata. Our analysis provides a better understanding of epsilon-free automata representing regular expressions: they are all the results of the application of some combinations of epsilon-removal and minimization to the classical Thompson automata. This makes it straightforward to generalize these algorithms to the weighted case, which also results in much simpler algorithms than existing ones. For weighted regular expressions over a closed semiring, we extend the notion of follow automata to the weighted case. We also present the first algorithm to compute the Antimirov automata in the weighted case.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 2006
- Accession Number
- ADA459063
Entities
People
- Cyril Allauzen
- Mehryar Mohri
Organizations
- New York University