Towards Co-Channel Speaker Separation by 2-D Demodulation of Spectrograms

Abstract

This paper explores a two-dimensional (2-D) processing approach for co-channel speaker separation of voiced speech. We analyze localized time-frequency regions of a narrowband spectrogram using 2-D Fourier transforms and propose a 2-D amplitude modulation model based on pitch information for single and multi-speaker content in each region. Our model maps harmonically-related speech content to concentrated entities in a transformed 2-D space, thereby motivating 2-D demodulation of the spectrogram for analysis/synthesis and speaker separation. Using a priori pitch estimates of individual speakers, we show through a quantitative evaluation: 1) Utility of the model for representing speech content of a single speaker and 2) Its feasibility for speaker separation. For the separation task, we also illustrate benefits of the model's representation of pitch dynamics relative to a sinusoidal-based separation system.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 21, 2009
Accession Number
ADA519581

Entities

People

  • Thomas F. Quatieri
  • Tianyu T. Wang

Organizations

  • Massachusetts Institute of Technology

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Amplitude
  • Amplitude Modulation
  • Demodulation
  • Department Of Defense
  • Dynamics
  • Filters
  • Filtration
  • Frequency
  • Modulation
  • Observation
  • Signal Processing
  • Test And Evaluation
  • Two Dimensional
  • United States Government
  • Waveforms

Readers

  • Image Processing and Computer Vision.
  • Speech Processing/Speech Recognition.

Technology Areas

  • Space