Towards Co-Channel Speaker Separation by 2-D Demodulation of Spectrograms

Abstract

This paper explores a two-dimensional (2-D) processing approach for co-channel speaker separation of voiced speech. We analyze localized time-frequency regions of a narrowband spectrogram using 2-D Fourier transforms and propose a 2-D amplitude modulation model based on pitch information for single and multi-speaker content in each region. Our model maps harmonically-related speech content to concentrated entities in a transformed 2-D space, thereby motivating 2-D demodulation of the spectrogram for analysis/synthesis and speaker separation. Using a priori pitch estimates of individual speakers, we show through a quantitative evaluation: 1) Utility of the model for representing speech content of a single speaker and 2) Its feasibility for speaker separation. For the separation task, we also illustrate benefits of the model's representation of pitch dynamics relative to a sinusoidal-based separation system.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Oct 21, 2009
Accession Number: ADA519581

Entities

People

Thomas F. Quatieri
Tianyu T. Wang

Organizations

Massachusetts Institute of Technology

Towards Co-Channel Speaker Separation by 2-D Demodulation of Spectrograms

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Readers

Technology Areas