Speech Segregation Based on Sound Localization

Abstract

At a cocktail party, we can selectively attend to a single voice and filter out all the other acoustical interferences. How to simulate this perceptual ability remains a great challenge. This paper describes a novel machine learning approach to speech segregation, in which a target speech signal is separated from interfering sounds using spatial location cues: interaural time differences (ITD) and interaural intensity differences (IID). The auditory masking effect motivates the notion of an ideal time-frequency binary mask, which selects the target if it is stronger than the interference in a local time-frequency (T-F) unit. We observe that within a narrow frequency band, modifications to the relative strength of the target source with respect to the interference trigger systematic deviations for ITD and IID. For a given spatial configuration, this interaction produces characteristic clustering in the binaural feature space. Consequently, we perform pattern classification in order to estimate ideal binary masks. A systematic evaluation shows that the resulting system produces masks very close to ideal binary ones, and gives a significant improvement in performance over an existing approach, as quantified by changes in signal-to-noise ratio before and after segregation.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jan 01, 2002
Accession Number: AD1001139

Entities

People

DeLiang Wang
Guy J. Brown
Nicoleta Roman

Organizations

Ohio State University

Speech Segregation Based on Sound Localization

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas