Lattice Based Language Models
Abstract
This paper introduces lattice based language models, a new language modeling paradigm. These models construct multi-dimensional hierarchies of partitions and select the most promising partitions to generate the estimated distributions. We discussed a specific two dimensional lattice and propose two primary features to measure the usefulness of each node: the training-set history count and the smoothed entropy of its prediction. Smoothing techniques are reviewed and a generalization of the conventional backoff strategy to multiple dimensions is proposed. Preliminary experimental results are obtained on the SWITCHBOARD corpus which lead to a 6.5% perplexity reduction over a word trigram model.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 01, 1997
- Accession Number
- ADA333294
Entities
People
- Pierre Dupont
- Roni Rosenfeld
Organizations
- Carnegie Mellon University