Language Modeling With Sentence-Level Mixtures

Abstract

This paper introduces a simple mixture language model that attempts to capture long distance constraints in a sentence or paragraph. The model is an m-component mixture of trigram models. The models were constructed using a 5K vocabulary and trained using a 76 million word Wail Street Journal text corpus. Using the BU recognition system, experiments show a 7% improvement in recognition accuracy with the mixture trigram models as compared to using a trigram model.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1994
Accession Number
ADA459584

Entities

People

  • J. R. Rohlicek
  • Mari Ostendorf
  • Rukmini Iyer

Organizations

  • Boston University

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Automated Speech Recognition
  • Automatic
  • Context Free Grammars
  • Drug Abuse
  • Grammars
  • Hidden Markov Models
  • Language
  • Markov Models
  • Models
  • Natural Languages
  • Probability
  • Recognition
  • Standards
  • Test Sets
  • Vocabulary

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation
  • Pavement Materials Engineering.