Language Modeling With Sentence-Level Mixtures
Abstract
This paper introduces a simple mixture language model that attempts to capture long distance constraints in a sentence or paragraph. The model is an m-component mixture of trigram models. The models were constructed using a 5K vocabulary and trained using a 76 million word Wail Street Journal text corpus. Using the BU recognition system, experiments show a 7% improvement in recognition accuracy with the mixture trigram models as compared to using a trigram model.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 1994
- Accession Number
- ADA459584
Entities
People
- J. R. Rohlicek
- Mari Ostendorf
- Rukmini Iyer
Organizations
- Boston University