Cepstral and Auditory Model Features for Speaker Recognition

Abstract

The TIMIT and KING databases, as well as a ten day AFIT speaker corpus, are used to compare proven spectral processing techniques to an auditory neural representation for speaker identification. The feature sets compared were Linear Predictive Coding (LPC) cepstral coefficients and auditory nerve firing rates using the Payton model. This auditory model provides for the mechanisms found in the human middle and inner auditory periphery as well as neural transduction. Clustering algorithms were used to generate speaker specific codebooks - one statistically based and the other a neural approach. These algorithms are the Linde-Buzo-Gray (LBG) algorithm and a Kohonen self-organizing feature map (SOFM). The LBG algorithm consistently provided optimal codebook designs with corresponding better classification rates. The resulting Vector Quantized (VQ) distortion based classification indicates the auditory model provides slightly reduced recognition in clean studio quality recordings (LPC 100%, Payton 90%), yet achieves similar performance to the LPC cepstral representation in both degraded environments (both 95%) and in test data recorded over multiple sessions (both over 98%). A variety of normalization techniques, preprocessing procedures and classifier fusion methods were examined on this biologically motivated feature set. Speaker identification, Auditory models, Vector quantization, Neural networks, User verification.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Dec 01, 1992
Accession Number: ADA259076

Entities

People

John M. Colombi

Organizations

Air Force Institute of Technology

Cepstral and Auditory Model Features for Speaker Recognition

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers

Technology Areas