From Word-Spotting to OOV Modeling

Abstract

This paper explores one dimension along which word spotting and speech recognition differ: the nature of the background model. In word spotting, a relatively small number of keywords float on a sea of unknown words. In speech recognition, an occasional unknown word punctuates utterances that are otherwise completely within the vocabulary. Despite this difference in viewpoint, in some circumstances implementations of the two may become very similar. When transcribed data is available for a domain, word spotting benefits from the more detailed background model this can support. The manner in which the background is modeled in these cases is reminiscent of speech recognition. For example, a large vocabulary with good coverage may be extracted from the corpus, so that relatively few words in an utterance remain unmodeled. In this case, the situation is qualitatively similar to OOV modeling in a conventional speech recognizer, except that the vocabulary is strictly divided into "filler" and "keyword." This paper describes a mechanism for bootstrapping from a relatively weak background model for word spotting, where OOV words dominate, to a much stronger model where many more word or phrase clusters have been moved to the foreground and explicitly modeled. With this increase in vocabulary comes an increase in the potency of language modeling, boosting performance on the original vocabulary. This paper shows how a conventional speech recognizer can be convinced to cluster frequently occurring acoustic patterns, without requiring the existence of transcribed data.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2001
Accession Number
ADA434772

Entities

People

  • Paul Fitzpatrick

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Automated Speech Recognition
  • Automatic
  • Clustering
  • Competition
  • Detection
  • Frequency
  • Hypotheses
  • Iterations
  • Judgment
  • Language
  • Recognition
  • Sequences
  • Signal Processing
  • Time Intervals
  • Training
  • Vocabulary
  • Word Recognition

Readers

  • Educational Psychology
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation