Nonlinear Auditory Modeling as a Basis for Speaker Recognition

Abstract

In this report, we develop a front-end nonlinear auditory model based on recent work of Dau, Puschel, and Kohlrausch (DPK) [Dau, Puschel, and Kohlrausch, 1997]. An important aspect of the model is the robust accentuation of temporal change in a signal at the cochlea level that forms the basis of a feature set for automatic speaker recognition. Preliminary speaker recognition experiments with the DPK front-end auditory model give performance close to that from the standard mel-cepstrum. Fusion of scores from the mel-cepstrum and the DPK front-end auditory model, however, is shown to give a useful performance gain relative to the standard mel-cepstrum alone. The dynamics provided by the nonlinear auditory model, therefore, appears to provide some 'orthogonality' to that of the more static mel-cepstral representation. In addition, in this report, we provide initial development of new 'common modulation' features based on modeling a more central region of auditory processing in the brain's inferior colliculus than the low-level auditory front-end. These higher-level features rely on the DPK auditory model as a foundation for further analysis of low-level temporal trajectories. This new feature representation is an important research direction and provides additional feature 'orthogonality' to front-end auditory processing, as exhibited in improved speaker recognition performance with fusion of scores from low-level and high-level feature sets.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: May 17, 2002
Accession Number: ADA402327

Entities

People

Thomas F. Quatieri

Organizations

Massachusetts Institute of Technology

Nonlinear Auditory Modeling as a Basis for Speaker Recognition

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers

Technology Areas