Development and Evaluation of Audio-Visual ASR: A Study on Connected Digit Recognition

Abstract

We present our findings from audio-visual speech recognition experiments for connected digit recognition in noisy environments. We derive hybrid (geometric- and appearance-based) visual lip features using a real-time lip tracking algorithm that we proposed previously. Using a small single-speaker corpus modeled after the TIDIGITS database, we build whole-word HMMs using both single-stream and 2-stream modeling strategies. For the 2-stream HMM method, we use stream-dependent weights to adjust the relative contributions of the two feature streams based on the acoustic SNR level. The 2-stream HMM art consistently gave the lowest WER, with an error reduction of 83% at -3dB SNR level compared to the acoustic-only is baseline. Visual-only ASR WER at 6.85% was also achieved. A real-time system prototype was developed for concept demonstration.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jun 12, 2002
Accession Number: ADP014020

Entities

People

Michael T. Chan

Development and Evaluation of Audio-Visual ASR: A Study on Connected Digit Recognition

Abstract

Document Details

Entities

People

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas