The 2005 AFRL/HEC One-Speaker Detection Systems

Abstract

This paper describes the one-speaker detection systems submitted by AFRL/HEC for several of the training and testing conditions in the 2005 NIST Speaker Recognition Evaluation. For each condition, the overall system score was the weighted combination of scores from several component systems. The component systems were based on: mel-frequency cepstral coefficients (MFCCs) and (Gaussian mixture models (GMMs); MFCCs and phoneme-specific GMMs (PS-GMMs); linear-prediction-based cepstral coefficients (LPCCs) from closed-phase analysis; formant center frequencies, formant bandwidths, and fundamental frequency(FMBWFO); and word language modeling (WLM). The score combination was done using single-layer perceptrons, with the grouping of the component systems depending on the lengths of the training and testing files. For some of the testing and/or training conditions involving ten-second speech files, the system performance improved from the inclusion of the FMBWFO and LPCC systems, while the MFCC/PS-GMM system provided additional benefits in the one-conversation testing conditions involving larger amounts of training data.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 2006
Accession Number
ADA445157

Entities

People

  • Brian M. Ore
  • Eric G. Hansen
  • Raymond E. Slyh

Organizations

  • General Dynamics

Tags

Communities of Interest

  • C4I
  • Human Systems

DTIC Thesaurus Topics

  • Air Force
  • Air Force Research Laboratories
  • Bandwidth
  • Coefficients
  • Detection
  • False Alarms
  • Frequency
  • Hidden Markov Models
  • Inclusions
  • Information Systems
  • Language
  • Neural Networks
  • Probability
  • Recognition
  • Signal Processing
  • Test And Evaluation
  • Training

Fields of Study

  • Computer science
  • Engineering

Readers

  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML