Binary and Ratio Time-frequency Masks for Robust Speech Recognition

Abstract

A time-varying Wiener filter extracts a speech signal from a mixture using the a priori signal-to-noise ratio in a local time-frequency unit. We estimate this ratio using a binaural processor and derive a ratio time-frequency mask. This mask is used to extract the speech, which is then fed to a conventional speech recognizer operating in the cepstral domain. We compare the performance of this system with a missing-data recognizer that operates in the spectral domain using the time frequency units dominated by speech. For use by the missing-data recognizer, the same processor is used to estimate an ideal time-frequency binary mask, which selects the speech if it is stronger than the interference in a local time-frequency unit. We find that the performance of the missing-data recognizer is better on a small vocabulary recognition task but the performance of the conventional recognizer is substantially better when the vocabulary size is larger.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2004
Accession Number
AD1001184

Entities

People

  • DeLiang Wang
  • Nicoleta Roman
  • Soundararajan Srinivasan

Organizations

  • Ohio State University

Tags

Communities of Interest

  • C4I
  • Energy and Power Technologies
  • Sensors

DTIC Thesaurus Topics

  • Accuracy
  • Acoustics
  • Automated Speech Recognition
  • Command And Control
  • Computational Complexity
  • Databases
  • Decoding
  • Filtration
  • Hidden Markov Models
  • Information Science
  • Language
  • Markov Models
  • Probability
  • Recognition
  • Signal Processing
  • Statistical Analysis
  • Statistics

Fields of Study

  • Engineering

Readers

  • Approximation Theory.
  • Phased Array Antenna Design.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Translation