Binary and Ratio Time-frequency Masks for Robust Speech Recognition

Abstract

A time-varying Wiener filter extracts a speech signal from a mixture using the a priori signal-to-noise ratio in a local time-frequency unit. We estimate this ratio using a binaural processor and derive a ratio time-frequency mask. This mask is used to extract the speech, which is then fed to a conventional speech recognizer operating in the cepstral domain. We compare the performance of this system with a missing-data recognizer that operates in the spectral domain using the time frequency units dominated by speech. For use by the missing-data recognizer, the same processor is used to estimate an ideal time-frequency binary mask, which selects the speech if it is stronger than the interference in a local time-frequency unit. We find that the performance of the missing-data recognizer is better on a small vocabulary recognition task but the performance of the conventional recognizer is substantially better when the vocabulary size is larger.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: May 01, 2004
Accession Number: AD1001184

Entities

People

DeLiang Wang
Nicoleta Roman
Soundararajan Srinivasan

Organizations

Ohio State University

Binary and Ratio Time-frequency Masks for Robust Speech Recognition

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas