Multi-Modal sensory Fusion with Application to Audio-Visual Speech Recognition

Abstract

In this work we consider the bimodal fusion problem in audio-visual speech recognition. A novel sensory fission architecture based on the coupled hidden Markov models (CHMMs) is presented. CHMMs are directed graphical models of stochastic processes and are a special type of dynamic Bayesian networks. The proposed fusion architecture allows us to address the statistical modeling and the fission of audio-visual speech in a unified framework. Furthermore, the architecture is capable of capturing the asynchronous and temporal inter-modal dependencies between the two information channels. We describe a model transformation strategy to facilitate inference and learning in CHMMs. Results from audio-visual speech recognition experiments confirmed the superior capability of the proposed fusion architecture.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jun 12, 2002
Accession Number: ADP014018

Entities

People

Stephen M. Chu
Thomas Huang

Organizations

University of Illinois Urbana–Champaign

Multi-Modal sensory Fusion with Application to Audio-Visual Speech Recognition

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas