Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection

Abstract

Voice activity detection (VAD) is an important topic in audio signal processing. Contextual information is important for improving the performance of VAD at low signal-to noise ratios. Here we explore contextual information by machine learning methods at three levels. At the top level, we employ an ensemble learning framework, named multi-resolution stacking (MRS), which is a stack of ensemble classifiers. Each classifier in a building block inputs the concatenation of the predictions of its lower building blocks and the expansion of the raw acoustic feature by a given window (called a resolution). At the middle level, we describe a base classifier in MRS, named boosted deep neural network (bDNN). bDNN first generates multiple base predictions from different contexts of a single frame by only one DNN and then aggregates the base predictions for a better prediction of the frame, and it is different from computationally-expensive boosting methods that train ensembles of classifiers for multiple base predictions. At the bottom level, we employ the multi-resolution cochlea gram feature, which incorporates the contextual information by concatenating the cochlea gram features at multiple spectro temporal resolutions. Experimental results show that the MRS based VAD outperforms other VADs by a considerable margin. Moreover, when trained on a large amount of noise types and a wide range of signal-to-noise ratios, the MRS-based VAD demonstrates surprisingly good generalization performance on unseen test scenarios, approaching the performance with noise-dependent training.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 2015
Accession Number
AD1001129

Entities

People

  • DeLiang Wang
  • Xiao-lei Zhang

Organizations

  • Ohio State University

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Artificial Intelligence Software
  • Automata Theory
  • Bayesian Networks
  • Computer Science
  • Computers
  • Deep Belief Networks
  • Detection
  • Dimensionality Reduction
  • Electronic Mail
  • False Alarms
  • Information Science
  • Machine Learning
  • Network Science
  • Neural Networks
  • Probabilistic Models
  • Signal Processing
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Atmospheric Science/Meteorology
  • Neural Network Machine Learning.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Neural Networks