Visual Speech Feature Extraction From Natural Speech for Multi-modal ASR
Abstract
Improving the accuracy of speech recognition technology by addition of visual information is the key approach to multi-modal ASR research. In this work, we address two important issues, which are lip tracking and the visual speech feature extraction algorithm. In order to utilize the multi-modal ASR for natural speech, the visual front end algorithm must extract affine and lighting condition invariant visual speech features. This paper focuses on both the lip tracking algorithm using the Bayesian framework and a novel pixel based visual speech feature extraction algorithm based on kurtosis measures of the frequency profile of the local image blocks.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jun 12, 2002
- Accession Number
- ADP014023
Entities
People
- John N. Gowdy
- Sabri Gurbuz
Organizations
- Clemson University