Speech Segregation based on Binary Classification

Abstract

This AFOSR project aimed to develop a classification-based approach to address the speech segregation challenge. The supervised approach is in sharp contrast to traditional speech segregation approaches. There are four major accomplishments made in this project. First, a supervised approach based on neural networks was developed to perform pitch tracking in very noisy conditions. Second, different training targets were examined for supervised speech segregation, leading to the adoption of the ideal ratio mask (IRM). A subsequent listening evaluation shows increased intelligibility in noise for human listeners following IRM estimation. Third, an algorithm was proposed to recognize speakers in cochannel (two-talker) conditions. This algorithm uses deep neural networks for cochannel speaker identification, and achieves the state-of-the-art results in both anechoic and reverberant conditions. Fourth, a spectral mapping method was developed to address the issue of robustness to room reverberation. This supervised method learns a mapping from the magnitude spectrogram of reverberant speech to that of anechoic speech, as well as from the spectrogram of reverberant-noisy speech to that of anechoic-clean speech.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 15, 2016
Accession Number
AD1012270

Entities

People

  • DeLiang Wang

Organizations

  • Ohio State University

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Air Force Research Laboratories
  • Algorithms
  • Artificial Intelligence Software
  • Automated Speech Recognition
  • Computer Languages
  • Computer Science
  • Dimensionality Reduction
  • Identification
  • Intelligibility
  • Language
  • Machine Learning
  • Neural Networks
  • Perception
  • Recognition
  • Signal Processing
  • Supervised Machine Learning
  • Training

Fields of Study

  • Computer science

Readers

  • Defense Acquisition Program Management
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks