Massively-Parallel Architectures for Automatic Recognition of Visual Speech Signals
Abstract
During the last year significant progress has been made in the primary objective of estimating the acoustic characteristics fo speech from the visual speech signals. Neural networks have been trained on a database of vowels. The raw images of faces, aligned and preprocessed, were used as input to these network, which were trained to estimate the corresponding envelope of the acoustic spectrum. The performance of the networks was better than trained humans and was comparable with optimized pattern classifiers. Our approach avoids the problems of information loss through early categorization. The acoustic information that the network extracts from the visual signal can be used to supplement the acoustic signal in noisy environments, such as cockpits. During the next year we extend these results to diphthongs using recurrent neural networks and temporal sequences of input images.
Document Details
- Document Type
- Technical Report
- Publication Date
- Oct 12, 1988
- Accession Number
- ADA202898
Entities
People
- Terrence J. Sejnowski
Organizations
- Johns Hopkins University