Visual Speech Feature Extraction From Natural Speech for Multi-modal ASR

Abstract

Improving the accuracy of speech recognition technology by addition of visual information is the key approach to multi-modal ASR research. In this work, we address two important issues, which are lip tracking and the visual speech feature extraction algorithm. In order to utilize the multi-modal ASR for natural speech, the visual front end algorithm must extract affine and lighting condition invariant visual speech features. This paper focuses on both the lip tracking algorithm using the Bayesian framework and a novel pixel based visual speech feature extraction algorithm based on kurtosis measures of the frequency profile of the local image blocks.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jun 12, 2002
Accession Number: ADP014023

Entities

People

John N. Gowdy
Sabri Gurbuz

Organizations

Clemson University

Visual Speech Feature Extraction From Natural Speech for Multi-modal ASR

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas