Model-based Sequential Organization in Cochannel Speech

Abstract

A human listener has the ability to follow a speakers voice while others are speaking simultaneously; in particular, the listener can organize the time-frequency energy of the same speaker across time into a single stream. In this paper, we focus on sequential organization in cochannel speech, or mixtures of two voices. We extract minimally corrupted segments, or usable speech, in cochannel speech using a robust multipitch tracking algorithm. The extracted usable speech is shown to capture speaker characteristics and improves speaker identification performance across various target-to-interferer ratios. To utilize speaker characteristics for sequential organization, we extend the traditional speaker identification framework to cochannel speech and derive a joint objective for sequential grouping and speaker identification, leading to a problem of search for the optimum hypothesis. Subsequently we propose a hypothesis pruning algorithm based on speaker models in order to make the search computationally feasible. Evaluation results show that the proposed system approaches the ceiling speaker identification performance obtained with prior pitch information, and yields significant improvement over alternative approaches on sequential organization.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2004
Accession Number
AD1001181

Entities

People

  • DeLiang Wang
  • Yang Shao

Organizations

  • Ohio State University

Tags

Communities of Interest

  • Energy and Power Technologies
  • Human Systems

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Automated Speech Recognition
  • Cognitive Science
  • Computational Complexity
  • Computational Science
  • Computer Science
  • Frequency
  • Hidden Markov Models
  • Identification
  • Identification Systems
  • Markov Models
  • Models
  • New York
  • Power Spectra
  • Probability
  • Recognition

Fields of Study

  • Engineering

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Computer Vision.
  • Speech Processing/Speech Recognition.