Transforming Binary Uncertainties for Robust Speech Recognition

Abstract

Recently several algorithms have been proposed to enhance noisy speech by estimating a binary mask that can be used to select those time-frequency regions of a noisy speech signal that contain more speech energy than noise energy. This binary mask encodes the uncertainty associated with enhanced speech in the linear spectral domain. The use of the cepstral transformation smears the information from the noise dominant time-frequency regions across all the cepstral features. We propose a supervised approach using regression trees to learn the non linear transformation of the uncertainty from the linear spectral domain to the cepstral domain. This uncertainty is used by a decoder that exploits the variance associated with the enhanced cepstral features to improve robust speech recognition. Systematic evaluations on a subset of the Aurora4 task using the estimated uncertainty shows substantial improvement over the baseline performance across various noise conditions.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 2006
Accession Number
AD1001223

Entities

People

  • DeLiang Wang
  • Soundararajan Srinivasan

Organizations

  • Ohio State University

Tags

Communities of Interest

  • Biomedical
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Algorithms
  • Automated Speech Recognition
  • Cognitive Science
  • Computer Science
  • Databases
  • Decoding
  • Distortion
  • Engineering
  • Frequency
  • Frequency Domain
  • Hidden Markov Models
  • Information Science
  • Markov Models
  • Probability
  • Recognition
  • Regression Analysis
  • Test Sets

Fields of Study

  • Computer science
  • Engineering

Readers

  • Acoustical Oceanography.
  • Acoustics.
  • Computer Vision.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference