Nonlinear Auditory Modeling as a Basis for Speaker Recognition
Abstract
In this report, we develop a front-end nonlinear auditory model based on recent work of Dau, Puschel, and Kohlrausch (DPK) [Dau, Puschel, and Kohlrausch, 1997]. An important aspect of the model is the robust accentuation of temporal change in a signal at the cochlea level that forms the basis of a feature set for automatic speaker recognition. Preliminary speaker recognition experiments with the DPK front-end auditory model give performance close to that from the standard mel-cepstrum. Fusion of scores from the mel-cepstrum and the DPK front-end auditory model, however, is shown to give a useful performance gain relative to the standard mel-cepstrum alone. The dynamics provided by the nonlinear auditory model, therefore, appears to provide some 'orthogonality' to that of the more static mel-cepstral representation. In addition, in this report, we provide initial development of new 'common modulation' features based on modeling a more central region of auditory processing in the brain's inferior colliculus than the low-level auditory front-end. These higher-level features rely on the DPK auditory model as a foundation for further analysis of low-level temporal trajectories. This new feature representation is an important research direction and provides additional feature 'orthogonality' to front-end auditory processing, as exhibited in improved speaker recognition performance with fusion of scores from low-level and high-level feature sets.
Document Details
- Document Type
- Technical Report
- Publication Date
- May 17, 2002
- Accession Number
- ADA402327
Entities
People
- Thomas F. Quatieri
Organizations
- Massachusetts Institute of Technology