Improving Acoustic Models by Watching Television

Abstract

Obtaining sufficient labelled training data is a persistent difficulty for speech recognition research. Although well transcribed data is expensive to produce, there is a constant stream of challenging speech data and poor transcription broadcast as closed-captioned television. We describe a reliable unsupervised method for identifying accurately transcribed sections of these broadcasts, and show how these segments can be used to train a recognition system. Starting from acoustic models trained on the Wall Street Journal database, a single iteration of our training method reduced the word error rate on an independent broadcast television news test set from 62.2% to 59.5%.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Mar 19, 1998
Accession Number: ADA350494

Entities

People

Alexander G. Hauptmann
Michael J. Witbrock

Organizations

Carnegie Mellon University

Improving Acoustic Models by Watching Television

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas