Speech Segregation based on Binary Classification

Abstract

This AFOSR project aimed to develop a classification-based approach to address the speech segregation challenge. The supervised approach is in sharp contrast to traditional speech segregation approaches. There are four major accomplishments made in this project. First, a supervised approach based on neural networks was developed to perform pitch tracking in very noisy conditions. Second, different training targets were examined for supervised speech segregation, leading to the adoption of the ideal ratio mask (IRM). A subsequent listening evaluation shows increased intelligibility in noise for human listeners following IRM estimation. Third, an algorithm was proposed to recognize speakers in cochannel (two-talker) conditions. This algorithm uses deep neural networks for cochannel speaker identification, and achieves the state-of-the-art results in both anechoic and reverberant conditions. Fourth, a spectral mapping method was developed to address the issue of robustness to room reverberation. This supervised method learns a mapping from the magnitude spectrogram of reverberant speech to that of anechoic speech, as well as from the spectrogram of reverberant-noisy speech to that of anechoic-clean speech.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jul 15, 2016
Accession Number: AD1012270

Entities

People

DeLiang Wang

Organizations

Ohio State University

Speech Segregation based on Binary Classification

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas