A Novel Scheme for Speaker Recognition Using a Phonetically-Aware Deep Neural Network

Abstract

We propose a novel framework for speaker recognition in which extraction of sufficient statistics for the state-of-the-art i-vector model is driven by a deep neural network (DNN) trained for automatic speech recognition (ASR). Specifically, the DNN replaces the standard Gaussian mixture model (GMM) to produce frame alignments. The use of an ASR-DNN system in the speaker recognition pipeline is attractive as it integrates the information from speech content directly into the statistics, allowing the standard backends to remain unchanged. Improvement from the proposed framework compared to a state-of-the-art system are of 30% relative at the equal error rate when evaluated on the telephone conditions from the 2012 NIST speaker recognition evaluation (SRE). The proposed framework is a successful way to efficiently leverage transcribed data for speaker recognition, thus opening up a wide spectrum of research directions.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2014
Accession Number
ADA613971

Entities

People

  • Luciana Ferrer
  • Mitchell Mclaren
  • Nicolas Scheffer
  • Yun Lei

Organizations

  • SRI International

Tags

Communities of Interest

  • C4I
  • Human Systems

DTIC Thesaurus Topics

  • Artificial Intelligence Computing
  • Artificial Intelligence Software
  • Automated Speech Recognition
  • Computer Languages
  • Convolutional Neural Networks
  • Data Science
  • Hidden Markov Models
  • Information Science
  • Machine Learning
  • Neural Networks
  • Probability
  • Probability Distributions
  • Standards
  • Statistics
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Distributed Systems and Data Platform Development
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks