Acoustic and Visual Cues of Turn-Taking Dynamics in Dyadic Interactions

Abstract

In this paper we introduce an empirical study of multimodal cues of turn-taking dynamics in a social interaction context. We first identify pauses, gaps and overlapped speech segments in the dyadic conversation dataset. Second, we define two types of measurements, Mean Equalized Energy (MEE) and Animation Level (AL) on the audio and video channels, respectively. Then, we verify the hypothesis that the speaker with higher MEE or AL is more likely to take the floor after silence or overlapped speech. The results suggest that both the vocal and visual movement energy offer useful cues towards inferring the intention of the interlocutor to grab the floor.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 28, 2011
Accession Number
AD1157766

Entities

People

  • Athanasios Katsamanis
  • Bo Xiao
  • Brian Baucom
  • Panayiotis G. Georgiou
  • Shrikanth Narayanan
  • Viktor Rozgic

Organizations

  • University of Southern California

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Computer Vision
  • Dialogue Systems
  • Energy Levels
  • Engineering
  • Extraction
  • Feature Extraction
  • Language
  • Linguistics
  • Microphones
  • Motion Capture
  • Pattern Recognition
  • Physical Properties
  • Signal Processing
  • Speech
  • Transitions
  • Universities
  • Video

Readers

  • Computer Vision.
  • Speech Processing/Speech Recognition.
  • Systems Analysis and Design

Technology Areas

  • AI & ML