A Probabilistic Multimodal Approach for Predicting Listener Backchannels

Abstract

During face-to-face interactions, listeners use backchannel feedback such as head nods as a signal to the speaker that the communication is working and that they should continue speaking. Predicting these backchannel opportunities is an important milestone for building engaging and natural virtual humans. In this paper we show how sequential probabilistic models (e.g., Hidden Markov Model or Conditional Random Fields) can automatically learn from a database of human-to-human interactions to predict listener backchannels using the speaker multimodal output features (e.g., prosody, spoken words and eye gaze). The main challenges addressed in this paper are automatic selection of the relevant features and optimal feature representation for probabilistic models. For prediction of visual backchannel cues (i.e., head nods),our prediction model shows a statistically significant improvement over a previously published approach based on hand-crafted rules.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2010
Accession Number
AD1157722

Entities

People

  • Iwan De Kok
  • Jonathan Gratch
  • Louis-Philippe Morency

Organizations

  • University of Southern California

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Algorithms
  • Computer Languages
  • Data Sets
  • Feature Selection
  • Hidden Markov Models
  • Machine Learning
  • Markov Models
  • Multiagent Systems
  • Natural Language Processing
  • Natural Languages
  • Pattern Recognition
  • Personality
  • Probabilistic Models
  • Probability
  • Psychology
  • Recognition
  • Social Psychology

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Artificial Intelligence
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML