Learning New Words from Spontaneous Speech: A Project Summary

Abstract

This research develops methods that enable spoken language systems to detect and correct their own errors, automatically extending themselves to incorporate new words. The occurrence of unknown or out-of-vocabulary words is one of the major problems frustrating the use of automatic speech understanding systems in real world tasks. Novel words cause recognition errors and often result in recognition and understanding failures. Yet, they are common. Real system users speak in a spontaneous and relatively unconstrained fashion. They do not know what words the system can recognize and thereby are likely to exceed the system's coverage. Even if speakers constrained their speech, there would still be a need for self-extending systems as certain tasks inherently require dynamic vocabulary expansion (e.g. new company names, new flight destinations, etc.). Further, it is costly and labor intensive to collect enough training data to develop a representative vocabulary (lexicon) and language model for a spoken interface application. Unlike transcription tasks where it is often possible to find large amounts of on-line data from which a lexicon and language model can be developed, for many tasks this is not feasible. Developers of applications and database interfaces will probably not have the resources to gather a large corpus of examples to train a system to their specific task. Yet, most current speech and language model research is oriented toward training from large corpora. This research enables systems to be developed from small amounts of data and then 'bootstrapped'. A simple version of a system is.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 1993
Accession Number
ADA277612

Entities

People

  • Sheryl R. Young

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Air Platforms
  • Energy and Power Technologies
  • Ground and Sea Platforms
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Accuracy
  • Acquisition
  • Artificial Intelligence
  • Automated Speech Recognition
  • Computer Science
  • Computers
  • Decoding
  • Detection
  • Hidden Markov Models
  • Language
  • Machine Learning
  • Markov Models
  • Pilot Studies
  • Probabilistic Models
  • Probability
  • Reasoning
  • Test Sets

Fields of Study

  • Computer science

Readers

  • Parallel and Distributed Computing.
  • Speech Processing/Speech Recognition.