Improving Acoustic Models by Watching Television

Abstract

Obtaining sufficient labelled training data is a persistent difficulty for speech recognition research. Although well transcribed data is expensive to produce, there is a constant stream of challenging speech data and poor transcription broadcast as closed-captioned television. We describe a reliable unsupervised method for identifying accurately transcribed sections of these broadcasts, and show how these segments can be used to train a recognition system. Starting from acoustic models trained on the Wall Street Journal database, a single iteration of our training method reduced the word error rate on an independent broadcast television news test set from 62.2% to 59.5%.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 19, 1998
Accession Number
ADA350494

Entities

People

  • Alexander G. Hauptmann
  • Michael J. Witbrock

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Accuracy
  • Acoustic Signals
  • Acoustics
  • Adaptive Training
  • Automated Speech Recognition
  • Computer Science
  • Errors
  • Hidden Markov Models
  • Language
  • Markov Models
  • Models
  • Natural Language Processing
  • Neural Networks
  • Recognition
  • Signal Processing
  • Training
  • Word Recognition

Fields of Study

  • Computer science

Readers

  • Educational Psychology
  • International Journalism and Media Studies.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Neural Networks