Large-Scale Topic Detection and Language Model Adaptation.

Abstract

The subject matter of any conversation or document can typically be described as some combination of elemental topics. We have developed a language model adaptation scheme that takes apiece of text, chooses the most similar topic clusters from a set of over 5000 elemental topics, and uses topic specific language models built from the topic clusters to rescore N-best lists. We are able to achieve a 15% reduction in perplexity and a small improvement in word error rate by using this adaptation. We also investigate the use of a topic tree, where the amount of training data for a specific topic can be judiciously increased in cases where the elemental topic cluster has too few word tokens to build a reliably smoothed and representative language model. Our system is able to fine-tune topic adaptation by interpolating models chosen from thousands of topics, allowing for adaptation to unique, previously unseen combinations of subjects.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 1997
Accession Number
ADA327553

Entities

People

  • Kristie Seymore
  • Roni Rosenfeld

Organizations

  • Carnegie Mellon University

Tags

DTIC Thesaurus Topics

  • Automated Speech Recognition
  • Automatic
  • Clustering
  • Computer Science
  • Detection
  • Frequency
  • Genetic Engineering
  • Hidden Markov Models
  • Hypotheses
  • Intellectual Property
  • Language
  • Machine Learning
  • Markov Models
  • Models
  • Probability
  • Recognition
  • Test Sets

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Regression Analysis.
  • Systems Analysis and Design