Massively Parallel Network Architectures for Automatic Recognition of Visual Speech Signals

Abstract

This research sought to produce a massively-parallel network architecture that could interpret speech signals from video recordings of human talkers. This report summarizes the project's results: (1) A corpus of video recordings from two human speakers was analyzed with image processing techniques ans used as the data for this study; (2) We demonstrated that a feedforward network could be trained to categorize vowels from these talkers. The performance was comparable to that of the nearest neighbors techniques and to trained humans on the same data; (3) We developed a novel approach to sensory fusion by training a network to transform from facial images to short-time spectral amplitude envelopes. This information can be used to increase the signal-to-noise ratio and hence the performance of acoustic speech recognition systems in noisy environments; (4) We explored the use of recurrent networks to perform the same mapping for continuous speech. Results of this project demonstrate the feasibility of adding a visual speech recognition component to enhance existing speech recognition systems. Such a combined system could be used in noisy environments, such as cockpits, where improved communication is needed. This demonstration of presymbolic fusion of visual and acoustic speech signals is consistent with our current understanding of human speech perception.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1990
Accession Number
ADA226968

Entities

People

  • Moise Goldstein
  • Terrence J. Sejnowski

Organizations

  • Johns Hopkins University

Tags

Communities of Interest

  • Energy and Power Technologies
  • Sensors

DTIC Thesaurus Topics

  • Acoustic Signals
  • Automated Speech Recognition
  • Computers
  • Computing System Architectures
  • Data Sets
  • Frequency
  • Information Processing
  • Information Science
  • Information Systems
  • Network Architecture
  • Neural Networks
  • Psychology
  • Recognition
  • Speech
  • Test Sets
  • Transfer Functions
  • Video Recording

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks