Deep Ensemble Learning for Monaural Speech Separation

Abstract

Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN) based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences between the two optimization objectives are not well understood. In this paper, we propose to stack ensembles of DNNs, named multi-resolution stacking, to address monaural speech separation. Each DNN in a module of the stack takes the concatenation of original acoustic features and expansion of the soft output of the lower module as its input, and predicts the ideal ratio mask of the target speaker. The DNNs in the same module explore different contexts by employing different window lengths. We have conducted extensive experiments with three speech corpora. The results demonstrate the effectiveness of the proposed method. We have also compared the two optimization objectives systematically and found that predicting the ideal time-frequency mask is more efficient in utilizing clean training speech, while predicting clean speech is less sensitive to SNR variations.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 2015
Accession Number
AD1001134

Entities

People

  • DeLiang Wang
  • Xiao-lei Zhang

Organizations

  • Ohio State University

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence Software
  • Automata Theory
  • Computer Science
  • Computers
  • Data Sets
  • Electronic Mail
  • Engineering
  • Information Science
  • Learning Machines
  • Machine Learning
  • Network Science
  • Neural Networks
  • Supervised Machine Learning
  • Test Sets
  • Time Domain
  • Training

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Nanofabrication and Microfabrication.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks