Voice based Speaker Recognition Combining Acoustic and Stylistic Features

Abstract

We present a survey of the state of the art in voice-based speaker identification research. We describe the general framework of a text-independent speaker verification system, and, as an example, SRIs voice-based speaker recognition system. This system was ranked among the best performing systems in NIST text-independent speaker recognition evaluations in the years 2004and 2005. It consists of six subsystems and a neural network combiner. The subsystems are categorized into two groups: acoustics-based, or low level, and stylistic, or high level. Acoustic subsystems extract short-term spectral features that implicitly capture the anatomy of the vocal apparatus, such as the shape of the vocal tract and its variations. These features are known to be sensitive to microphone and channel variations, and various techniques are used to compensate for these variations. High-level subsystems, on the other hand, capture the stylistic aspects of a persons voice, such as the speaking rate for particular words, rhythmic and intonation patterns, and idiosyncratic word usage. These features represent behavioral aspects of the persons identity and are shown to be complementary to spectral acoustic features. By combining all information sources we achieve equal error rate performance of around 3 on the NIST speaker recognition evaluation for 2 minutes of enrollment and 2 minutes of test data.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2008
Accession Number
AD1002507

Entities

People

  • Andreas Stolcke
  • Elizabeth Shriberg
  • Luciana Ferrer
  • Sachin S. Kajarekar

Organizations

  • SRI International

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies
  • Human Systems

DTIC Thesaurus Topics

  • Acoustic Properties
  • Automated Speech Recognition
  • Computer Science
  • Electrical Engineering
  • Feature Extraction
  • Hidden Markov Models
  • Identification
  • Information Science
  • Kernel Functions
  • Language
  • Machine Learning
  • Network Science
  • Neural Networks
  • Recognition
  • Standards
  • Statistics
  • Supervised Machine Learning

Readers

  • Speech Processing/Speech Recognition.
  • Systems Analysis and Design

Technology Areas

  • AI & ML