Techniques for Preprocessing Speech Signals for More Effective Audio Interfaces
Abstract
A user receiving instructions from various speech sources may be overwhelmed when these sources are speaking simultaneously in an unfavorable acoustical and noisy environment. In such a case, the user is required to separate the various sources from the mixture in order to make the speech intelligible. If no one source dominates or the mixing occurs for a sustained period of time, the human user may become mentally and physically overloaded resulting in fatigue and thus failing to separate the various speech sources into intelligible signals. In the case of the speech recognizer, recognition accuracy may be degraded to unacceptable levels. In this final report in research into methods for blind enhancement and separation of mixtures of speech signals, we present our results form further development and refinement of Frequency Domain, Second-Order Statistics-based decorrelation algorithms and in particular the Multi-resolution Frequency-Domain algorithm. These results include modifications to the algorithm for better performance at reduced cost, implementation, and performance evaluation under a wide set of noise and acoustic environments. Finally, we report on the development of a publically-available database specifically designed for evaluating various speech enhancement/separation algorithms.
Document Details
- Document Type
- Technical Report
- Publication Date
- Dec 01, 2001
- Accession Number
- ADA412195
Entities
People
- Phillip L. Deleon
Organizations
- New Mexico State University