Trainable Videorealistic Speech Animation

Abstract

We describe how to create with machine learning techniques a generative, videorealistic, speech animation module. A human subject is rst recorded using a videocamera as he/she utters a predetermined speech corpus. After processing the corpus automatically, a visual speech module is learned from the data that is capable of synthesizing the human subject's mouth uttering entirely novel utterances that were not recorded in the original video. The synthesized utterance is re-composited onto a background sequence which contains natural head and eye movement. The nal output is videorealistic in the sense that it looks like a video camera recording of the subject. At run time, the input to the system can be either real audio sequences or synthetic audio produced by a text-to-speech system, as long as they have been phonetically aligned. The two key contributions of this paper are 1) a variant of the multidimensional morphable model (MMM) to synthesize new, previously unseen mouth con gurations from a small set of mouth image prototypes; and 2) a trajectory synthesis technique based on regularization, which is automatically trained from the recorded video corpus, and which is capable of synthesizing trajectories in MMM space corresponding to any desired utterance.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2006
Accession Number
ADA454663

Entities

People

  • Gadi Geiger
  • Tomaso Poggio
  • Tony Ezzat

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Energy and Power Technologies
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Computer Graphics
  • Computer Vision
  • Computers
  • Dimensionality Reduction
  • Eye Movements
  • Generative Models
  • Geometry
  • Graphics
  • Hidden Markov Models
  • Image Processing
  • Jet Propulsion
  • Neural Networks
  • Pattern Recognition
  • Three Dimensional
  • Two Dimensional

Fields of Study

  • Computer science

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks
  • Space
  • Space - Spacecraft Maneuvers