Speaker Recognition Using Real vs. Synthetic Parallel Data for DNN Channel Compensation

Abstract

Recent work has shown large performance gains using denoising DNNs for speech processing tasks under challenging acoustic conditions. However, training these DNNs requires large amounts of parallel multichannel speech data which can be impractical or expensive to collect. The effective use of synthetic parallel data as an alternative has been demonstrated for several speech technologies including automatic speech recognition and speaker recognition (SR). This paper demonstrates that denoising DNNs trained with real Mixer 2 multichannel data perform only slightly better than DNNs trained with synthetic multichannel data for microphone SR on Mixer 6. Large reductions in pooled error rates of 50% EER and 30% min DCF are achieved using DNNs trained on real Mixer 2 data. Nearly the same performance gains are achieved using synthetic data generated with a limited number of room impulse responses (RIRs) and noise sources derived from Mixer 2. Using RIRs from three publicly available sources used in the Kaldi ASpIRE recipe yields somewhat lower pooled gains of 34% EER and25% min DCF. These results confirm the effective use of synthetic parallel data for DNN channel compensation even when the RIRs used for synthesizing the data are not particularly well-matched to the task.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 08, 2016
Accession Number
AD1033607

Entities

People

  • Douglas A. Reynolds
  • Frederick S. Richardson
  • Jennifer T. Melot
  • Michael S. Brandstein

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Automated Speech Recognition
  • Compensation
  • Data Sets
  • Databases
  • Department Of Defense
  • Gaussian Distributions
  • Information Science
  • Microphones
  • Neural Networks
  • Normal Distribution
  • Order Statistics
  • Recognition
  • Statistics
  • Test And Evaluation
  • Test Sets
  • Training
  • United States Government

Readers

  • Neural Network Machine Learning.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML