Lip Tracking for Audio-Visual Speech Recognition.

Abstract

Human speech is conveyed through both acoustic and visual channels and is therefore inherently multi-modal. Further, the two channels are largely complementary in that the acoustic signal typically contains information about the manner of articulation while the visual signal embodies knowledge of the place of articulation. This orthogonal nature of the audio and visual components has enticed researchers to develop audio-visual speech recognition systems that have been shown to be robust to acoustic noise. A fundamental requirement of automatic audio-visual speech recognition is the need for real-time tracking; however, this necessity has been largely ignored by the lipreading community. This work presents a new approach for tracking unadorned lips in real time (50 fields/sec). The tracking framework presented combines comprehensive shape and motion models learnt from continuous speech sequences with focused image feature detection methods. Statistical models of the grey-level appearance of the mouth are shown to enable identification of the lip boundary in poorly contrasted grey-level images. The combined armory of the these modeling approaches permits robust, real-time tracking of unadorned lips. Isolated-word recognition experiments using dynamic time warping and Hidden Markov Model-based recognizers demonstrate that real-time, contour-based, lip tracking can be used to provide robust recognition of degraded speech. In noisy acoustic conditions, the performance of recognizers incorporating visual shape parameters are superior to the acoustic-only solutions, providing for error rate reductions up to 44%.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 30, 1997
Accession Number
ADA329886

Entities

People

  • Robert A. Kaucic Jr

Organizations

  • Air Force Institute of Technology

Tags

Communities of Interest

  • Energy and Power Technologies
  • Materials and Manufacturing Processes
  • Sensors

DTIC Thesaurus Topics

  • Acoustic Channels
  • Air Force
  • Automated Speech Recognition
  • Computational Science
  • Computer Vision
  • Databases
  • Detectors
  • Human Factors Engineering
  • Image Processing
  • Information Science
  • Markov Models
  • Mathematical Filters
  • Neural Networks
  • Probability
  • Probability Distributions
  • Recognition
  • Two Dimensional

Fields of Study

  • Computer science

Readers

  • Computer Vision.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML