Inference and Estimation of a Long-Range Trigram Model

Abstract

We describe an implementation of a simple probabilistic link grammar. This probabilistic language model extends trigrams by allowing a word to be predicted not only from the two immediately preceeding words, but potentially from any preceeding pair of adjacent words that lie within the same sentence. In this way, the trigram model can skip over less informative words to make its predictions. The underlying 'grammar' is nothing, more than a list of pairs of words that can be linked together with one or more intervening words; this word- pair grammar is automatically inferred from a corpus of training text. We present a novel technique for indexing the model parameters that allows us to avoid all sorting in the M-step of the training algorithm. This results in significant savings in computation time, and is applicable to the training of a general probabilistic link grammar. Results of preliminary experiments carried out for this class of models are presented.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 1994
Accession Number
ADA286057

Entities

People

  • H. Printz
  • J. Gillett
  • J. Lafferty
  • S. D. Pietra
  • V. D. Pietra

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Automata
  • Computations
  • Computer Science
  • Estimators
  • Formal Languages
  • Grammars
  • Language
  • Maximum Likelihood Estimation
  • Models
  • Natural Languages
  • Probabilistic Models
  • Probability
  • Probability Distributions
  • Standards
  • Vocabulary

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Translation