Adding nesting structure to words

Abstract

We propose the model of nested words for representation of data with both a linear ordering and a hierarchically nested matching of items. Examples of data with such dual linear-hierarchical structure include executions of structured programs, annotated linguistic data, and HTML/XML documents. Nested words generalize both words and ordered trees, and allow both word and tree operations. We define nested word automata —finite-state acceptors for nested words, and show that the resulting class of regular languages of nested words has all the appealing theoretical properties that the classical regular word languages enjoys: deterministic nested word automata are as expressive as their nondeterministic counterparts; the class is closed under union, intersection, complementation, concatenation, Kleene-*, prefixes, and language homomorphisms; membership, emptiness, language inclusion, and language equivalence are all decidable; and definability in monadic second order logic corresponds exactly to finite-state recognizability. We also consider regular languages of infinite nested words and show that the closure properties, MSO-characterization, and decidability of decision problems carry over.

Document Details

Document Type
Pub Defense Publication
Publication Date
May 01, 2009
Source ID
10.1145/1516512.1516518

Entities

People

  • P. Madhusudan
  • Rajeev Alur

Organizations

  • Army Research Office
  • National Science Foundation
  • University of Illinois Urbana–Champaign
  • University of Pennsylvania

Tags

Readers

  • Computational Linguistics
  • Mathematical Modeling and Probability Theory.