Massively-Parallel Architectures for Automatic Recognition of Visual Speech Signals

Abstract

During the last year significant progress has been made in the primary objective of estimating the acoustic characteristics fo speech from the visual speech signals. Neural networks have been trained on a database of vowels. The raw images of faces, aligned and preprocessed, were used as input to these network, which were trained to estimate the corresponding envelope of the acoustic spectrum. The performance of the networks was better than trained humans and was comparable with optimized pattern classifiers. Our approach avoids the problems of information loss through early categorization. The acoustic information that the network extracts from the visual signal can be used to supplement the acoustic signal in noisy environments, such as cockpits. During the next year we extend these results to diphthongs using recurrent neural networks and temporal sequences of input images.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 12, 1988
Accession Number
ADA202898

Entities

People

  • Terrence J. Sejnowski

Organizations

  • Johns Hopkins University

Tags

Communities of Interest

  • Energy and Power Technologies
  • Sensors

DTIC Thesaurus Topics

  • Acoustic Signals
  • Automated Speech Recognition
  • Coding
  • Computational Science
  • Computations
  • Computer Science
  • Computers
  • Electrical Engineering
  • Human Factors Engineering
  • Language
  • Neural Networks
  • Recognition
  • Recurrent Neural Networks
  • Self Organizing Systems
  • Signal Processing
  • Symbols
  • Visual Signals

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Speech Processing/Speech Recognition.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks