Cepstral and Auditory Model Features for Speaker Recognition

Abstract

The TIMIT and KING databases, as well as a ten day AFIT speaker corpus, are used to compare proven spectral processing techniques to an auditory neural representation for speaker identification. The feature sets compared were Linear Predictive Coding (LPC) cepstral coefficients and auditory nerve firing rates using the Payton model. This auditory model provides for the mechanisms found in the human middle and inner auditory periphery as well as neural transduction. Clustering algorithms were used to generate speaker specific codebooks - one statistically based and the other a neural approach. These algorithms are the Linde-Buzo-Gray (LBG) algorithm and a Kohonen self-organizing feature map (SOFM). The LBG algorithm consistently provided optimal codebook designs with corresponding better classification rates. The resulting Vector Quantized (VQ) distortion based classification indicates the auditory model provides slightly reduced recognition in clean studio quality recordings (LPC 100%, Payton 90%), yet achieves similar performance to the LPC cepstral representation in both degraded environments (both 95%) and in test data recorded over multiple sessions (both over 98%). A variety of normalization techniques, preprocessing procedures and classifier fusion methods were examined on this biologically motivated feature set. Speaker identification, Auditory models, Vector quantization, Neural networks, User verification.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 1992
Accession Number
ADA259076

Entities

People

  • John M. Colombi

Organizations

  • Air Force Institute of Technology

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Authentication
  • Computational Science
  • Databases
  • Differential Equations
  • Ear
  • Electrical Engineering
  • Feature Extraction
  • Hidden Markov Models
  • Information Processing
  • Information Science
  • Machine Learning
  • Neural Networks
  • Pattern Recognition
  • Probability
  • Processing Equipment
  • Signal Processing
  • Supervised Machine Learning

Readers

  • Neural Network Machine Learning.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks