Estimating Grammar Parameters using Bounded Memory

Abstract

Estimating the parameters of stochastic context-free grammars (SCFGs) from data is an important, well-studied problem. Almost without exception, existing approaches make repeated passes over the training data. The memory requirements of such algorithms are ill-suited for embedded agents exposed to large amounts of training data over long periods of time. We present a novel algorithm, called HOLA, for estimating the parameters of SCFGs that computes summary statistics for each string as it is observed and then discards the string. The memory used by HOLA is bounded by the size of the grammar, not by the amount of training data. Empirical results show that HOLA performs as well as the Inside-Outside algorithm on a variety of standard problems, despite the fact that it has access to much less information.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2002
Accession Number
ADA459912

Entities

People

  • Brent Heeringa
  • Tom Oates

Organizations

  • University of Maryland, Baltimore

Tags

Communities of Interest

  • C4I

DTIC Thesaurus Topics

  • Algorithms
  • Automated Speech Recognition
  • Computations
  • Computer Science
  • Context Free Grammars
  • Convergence
  • Electrical Engineering
  • Estimators
  • Grammars
  • Iterations
  • Language
  • Learning
  • Natural Languages
  • Probability
  • Probability Distributions
  • Production
  • Training

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Mathematical Modeling and Probability Theory.
  • Military Logistics and Supply Chain Management