Massively-Parallel Architectures for Automatic Recognition of Visual Speech Signals

Abstract

During the last year significant progress has been made in the primary objective of estimating the acoustic characteristics fo speech from the visual speech signals. Neural networks have been trained on a database of vowels. The raw images of faces, aligned and preprocessed, were used as input to these network, which were trained to estimate the corresponding envelope of the acoustic spectrum. The performance of the networks was better than trained humans and was comparable with optimized pattern classifiers. Our approach avoids the problems of information loss through early categorization. The acoustic information that the network extracts from the visual signal can be used to supplement the acoustic signal in noisy environments, such as cockpits. During the next year we extend these results to diphthongs using recurrent neural networks and temporal sequences of input images.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Oct 12, 1988
Accession Number: ADA202898

Entities

People

Terrence J. Sejnowski

Organizations

Johns Hopkins University

Massively-Parallel Architectures for Automatic Recognition of Visual Speech Signals

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas