Supervised and Unsupervised Speaker Adaptation in the NIST 2005 Speaker Recognition Evaluation

Abstract

Starting in 2004, the annual NIST Speaker Recognition Evaluation (SRE) has added an optional unsupervised speaker adaptation track where test files are processed sequentially and one may update the target model. In this paper, various model adaptation techniques are implemented using a supervised (ideal) adaptation scheme. Once the best performing model adaptation method is found, unsupervised adaptation experiments are run using a threshold to determine when to update the target model. Three NIST training conditions, l0sec4w, lconv4w, and 8conv4w, all with the lconv4w test condition are used for experiments with the NIST 2005 SRE. MinDCF values for the three training conditions are reduced from 0.0708 to 0.0277 for l0sec4w, from 0.0385 to 0.0199 for lconv4w, and from 0.0264 to 0.0176 for Sconv4w using the supervised adaptation compared to the baseline. For the unsupervised adaptation, minDCF values were reduced to 0.0590, 0.0302, and 0.0210 for the respective training conditions.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Feb 01, 2006
Accession Number: ADA460150

Entities

People

Eric G. Hansen
Raymond E. Slyh
Timothy R. Anderson

Organizations

Air Force Research Laboratory

Supervised and Unsupervised Speaker Adaptation in the NIST 2005 Speaker Recognition Evaluation

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas