Improving Acoustic Models by Watching Television
Abstract
Obtaining sufficient labelled training data is a persistent difficulty for speech recognition research. Although well transcribed data is expensive to produce, there is a constant stream of challenging speech data and poor transcription broadcast as closed-captioned television. We describe a reliable unsupervised method for identifying accurately transcribed sections of these broadcasts, and show how these segments can be used to train a recognition system. Starting from acoustic models trained on the Wall Street Journal database, a single iteration of our training method reduced the word error rate on an independent broadcast television news test set from 62.2% to 59.5%.
Document Details
- Document Type
- Technical Report
- Publication Date
- Mar 19, 1998
- Accession Number
- ADA350494
Entities
People
- Alexander G. Hauptmann
- Michael J. Witbrock
Organizations
- Carnegie Mellon University