Cepstral Domain Talker Stress Compensation for Robust Speech Recognition
Abstract
Automatic speech recognition algorithms generally rely on the assumption that for the distance measure used, interword variabilities are smaller than interword variabilities so that appropriate separation in the measurements space is possible. As evidenced by degradation of recognition performance, the validity of such an assumption decreases from simple tasks to complex tasks, from cooperative talkers to casual talkers, and from laboratory talking environments to practical talking environments. This report presents a study of talker-stress interword variability, and an algorithm that compensates for the systematic changes observed. The study is based on Hidden Markov Models trained by speech tokens spoken in various talking styles. The talking styles include normal speech, fast speech, loud speech, soft speech, and talking with noise injected through earphones; the styles are designed to simulate speech produced under real stressful conditions. Cepstral coefficients are used as the parameters in the Hidden Markov Models. The stress compensation algorithm compensates for the variations in the cepstral coefficients in a hypothesis- driven manner. The functional form of the compensation is shown to correspond to the equalization of spectral tilts. Preliminary experiments indicate that a substantial reduction in recognition error rate can be achieved with relatively little increase in computation and storage requirements.
Document Details
- Document Type
- Technical Report
- Publication Date
- Nov 10, 1986
- Accession Number
- ADA176068
Entities
People
- Yunhui Chen
Organizations
- Massachusetts Institute of Technology