Lattice Based Language Models

Abstract

This paper introduces lattice based language models, a new language modeling paradigm. These models construct multi-dimensional hierarchies of partitions and select the most promising partitions to generate the estimated distributions. We discussed a specific two dimensional lattice and propose two primary features to measure the usefulness of each node: the training-set history count and the smoothed entropy of its prediction. Smoothing techniques are reviewed and a generalization of the conventional backoff strategy to multiple dimensions is proposed. Preliminary experimental results are obtained on the SWITCHBOARD corpus which lead to a 6.5% perplexity reduction over a word trigram model.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 1997
Accession Number
ADA333294

Entities

People

  • Pierre Dupont
  • Roni Rosenfeld

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Human Systems

DTIC Thesaurus Topics

  • Algorithms
  • Automated Speech Recognition
  • Clustering
  • Computer Science
  • Data Sets
  • Equations
  • Estimators
  • Hierarchies
  • Interpolation
  • Language
  • Probability
  • Random Variables
  • Switchboards
  • Test Sets
  • Training
  • Two Dimensional
  • Vocabulary

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation
  • Regression Analysis.