Supervised and Unsupervised Speaker Adaptation in the NIST 2005 Speaker Recognition Evaluation

Abstract

Starting in 2004, the annual NIST Speaker Recognition Evaluation (SRE) has added an optional unsupervised speaker adaptation track where test files are processed sequentially and one may update the target model. In this paper, various model adaptation techniques are implemented using a supervised (ideal) adaptation scheme. Once the best performing model adaptation method is found, unsupervised adaptation experiments are run using a threshold to determine when to update the target model. Three NIST training conditions, l0sec4w, lconv4w, and 8conv4w, all with the lconv4w test condition are used for experiments with the NIST 2005 SRE. MinDCF values for the three training conditions are reduced from 0.0708 to 0.0277 for l0sec4w, from 0.0385 to 0.0199 for lconv4w, and from 0.0264 to 0.0176 for Sconv4w using the supervised adaptation compared to the baseline. For the unsupervised adaptation, minDCF values were reduced to 0.0590, 0.0302, and 0.0210 for the respective training conditions.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 2006
Accession Number
ADA460150

Entities

People

  • Eric G. Hansen
  • Raymond E. Slyh
  • Timothy R. Anderson

Organizations

  • Air Force Research Laboratory

Tags

Communities of Interest

  • Human Systems

DTIC Thesaurus Topics

  • Air Force
  • Air Force Research Laboratories
  • Databases
  • Detection
  • Detectors
  • False Alarms
  • Hidden Markov Models
  • Language
  • Markov Models
  • Military Research
  • Models
  • Probability
  • Recognition
  • Test And Evaluation
  • Training
  • Warning Systems

Fields of Study

  • Computer science

Readers

  • Artificial Intelligence
  • Computational Modeling and Simulation
  • Exercise and Sports Science.

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks