From Word-Spotting to OOV Modeling

Abstract

This paper explores one dimension along which word spotting and speech recognition differ: the nature of the background model. In word spotting, a relatively small number of keywords float on a sea of unknown words. In speech recognition, an occasional unknown word punctuates utterances that are otherwise completely within the vocabulary. Despite this difference in viewpoint, in some circumstances implementations of the two may become very similar. When transcribed data is available for a domain, word spotting benefits from the more detailed background model this can support. The manner in which the background is modeled in these cases is reminiscent of speech recognition. For example, a large vocabulary with good coverage may be extracted from the corpus, so that relatively few words in an utterance remain unmodeled. In this case, the situation is qualitatively similar to OOV modeling in a conventional speech recognizer, except that the vocabulary is strictly divided into "filler" and "keyword." This paper describes a mechanism for bootstrapping from a relatively weak background model for word spotting, where OOV words dominate, to a much stronger model where many more word or phrase clusters have been moved to the foreground and explicitly modeled. With this increase in vocabulary comes an increase in the potency of language modeling, boosting performance on the original vocabulary. This paper shows how a conventional speech recognizer can be convinced to cluster frequently occurring acoustic patterns, without requiring the existence of transcribed data.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jan 01, 2001
Accession Number: ADA434772

Entities

People

Paul Fitzpatrick

Organizations

Massachusetts Institute of Technology

From Word-Spotting to OOV Modeling

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers

Technology Areas