Nonlinear Auditory Modeling as a Basis for Speaker Recognition

Abstract

In this report, we develop a front-end nonlinear auditory model based on recent work of Dau, Puschel, and Kohlrausch (DPK) [Dau, Puschel, and Kohlrausch, 1997]. An important aspect of the model is the robust accentuation of temporal change in a signal at the cochlea level that forms the basis of a feature set for automatic speaker recognition. Preliminary speaker recognition experiments with the DPK front-end auditory model give performance close to that from the standard mel-cepstrum. Fusion of scores from the mel-cepstrum and the DPK front-end auditory model, however, is shown to give a useful performance gain relative to the standard mel-cepstrum alone. The dynamics provided by the nonlinear auditory model, therefore, appears to provide some 'orthogonality' to that of the more static mel-cepstral representation. In addition, in this report, we provide initial development of new 'common modulation' features based on modeling a more central region of auditory processing in the brain's inferior colliculus than the low-level auditory front-end. These higher-level features rely on the DPK auditory model as a foundation for further analysis of low-level temporal trajectories. This new feature representation is an important research direction and provides additional feature 'orthogonality' to front-end auditory processing, as exhibited in improved speaker recognition performance with fusion of scores from low-level and high-level feature sets.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 17, 2002
Accession Number
ADA402327

Entities

People

  • Thomas F. Quatieri

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Air Force Research Laboratories
  • Amplitude Modulation
  • Auditory Perception
  • Automated Speech Recognition
  • Bandpass Filters
  • Bandwidth
  • Filters
  • Frequency
  • Identification
  • Low Pass Filters
  • Modulation
  • Neural Pathways
  • Nonlinear Dynamics
  • Perception
  • Production Models
  • Signal Processing
  • Standards

Readers

  • Neural Network Machine Learning.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference