Modeling the Perception of Multitalker Speech

Abstract

Listeners' ability to understand a target speaker in the presence of one or more simultaneous competing speakers is subject to two types of masking: Energetic and informational. Energetic masking occurs when target and interfering signals overlap in time and frequency resulting in portions of target becoming inaudible. Informational masking occurs when the listener is unable to segregate the target from interference, while both are audible. We present a model of multitalker speech perception that accounts for both types of masking. Human perception in the presence of energetic masking is modeled using a speech recognizer that treats the masked time-frequency units of target as missing data. The effects of informational masking on the recognizer are modeled using the output of a speech segregation system. On a systematic evaluation, the performance of the proposed model is in broad agreement with perceptual results.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2005
Accession Number
ADA637036

Entities

People

  • DeLiang Wang
  • Soundararajan Srinivasan

Organizations

  • Ohio State University

Tags

Communities of Interest

  • Biomedical
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Automated Speech Recognition
  • Biomedical Engineering
  • Cognitive Science
  • Cognitive Systems Engineering
  • Computer Science
  • Degradation
  • Engineering
  • Frequency
  • Identification
  • Information Operations
  • Perception
  • Recognition
  • Training
  • Universities
  • Word Recognition

Readers

  • Computational Fluid Dynamics (CFD)
  • Speech Processing/Speech Recognition.