Auditory Modeling as a Basis for Spectral Modulation Analysis with Application to Speaker Recognition

Abstract

This report explores auditory modeling as a basis for robust automatic speaker verification. Specifically, we have developed feature-extraction front-ends that incorporate (1) time-varying, level-dependent filtering, (2) variations in analysis filter-bank size, and (3) nonlinear adaptation. Our methods are motivated both by a desire to better mimic auditory processing relative to traditional front-ends (e.g., the mel-cepstrum) as well as by reported gains in automatic speech recognition robustness exploiting similar principles. Traditional mel-cepstral features in automatic speaker recognition are derived from ~ 20 invariant band-pass filter weights, thereby discarding temporal structure from phase. In contrast, cochlear frequency decomposition can be more precisely modeled as the output of ~ 3500 time-varying, level-dependent filters. Auditory signal processing is therefore more resolved in frequency than mel-cepstral analysis and also derives temporal information. Furthermore, loss of level-dependence has been suggested to reduce human speech reception in adverse acoustic environments. We were thus motivated to employ a recently proposed level-dependent compressed gammachirp filter bank in feature extraction as well as vary the number of filters or filter weights to improve frequency resolution. We are also simulating nonlinear adaptation models of inner hair cell function along the basilar membrane that presumably mimic temporal masking effects. Auditory-based front-ends are being evaluated with the Lincoln Laboratory Gaussian mixture model recognizer on the TIMIT database under clean and noisy (additive Gaussian white noise) conditions. Preliminary results of features derived from our auditory models suggest that they provide complementary information to the mel-cepstrum under clean and noisy conditions, resulting in speaker recognition performance improvements.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 31, 2007
Accession Number
ADA462760

Entities

People

  • Thomas F. Quatieri
  • Tianyu T. Wang

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Acoustic Signals
  • Auditory Nerve
  • Auditory Signals
  • Automated Speech Recognition
  • Automatic
  • Dimensionality Reduction
  • Feature Extraction
  • Filters
  • Filtration
  • Frequency
  • Machine Learning
  • Membranes
  • Pattern Recognition
  • Recognition
  • Signal Processing
  • Supervised Machine Learning
  • White Noise

Fields of Study

  • Engineering

Readers

  • Auditory Neuroscience/Auditory Physiology.
  • Speech Processing/Speech Recognition.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms