Performance Evaluation of Glottal Inverse Filtering Algorithms Using a Physiologically Based Articulatory Speech Synthesizer

Abstract

Glottal inverse filtering aims to estimate the glottal airflow signal from a speech signal for applications such as speaker recognition and clinical assessment. Nonetheless, evaluation of inverse filtering performance has been challenging due to the practical difficulty in measuring the true glottal signals while speech signals are recorded. Apart from this, it is suspected that the performance of many methods degrade in conditions that are of great interest, such as breathy voice, high pitch, soft/loud voice, and running speech. This paper presents a comprehensive, objective, and comparative evaluation of state-of-the-art inverse filtering algorithms that takes advantage of speech and glottal signals generated by a physiologically relevant speech synthesizer. The synthesizer provides a realistic simulation of the voice production process, and thus an adequate test bed for revealing the temporal and spectral performance characteristics of each algorithm. Included in the synthetic data are continuous running speech utterances and sustained vowels, which are produced with multiple voice qualities (pressed, slightly pressed, modal,slightly breathy, and breathy) and subglottal pressure levels to simulate the natural variations in real speech. In evaluating the accuracy of a glottal flow estimate, multiple error measures are used, including an error in the estimated signal that measures overall waveform deviation, as well as an error in each of several clinically relevant features extracted from the glottal flow estimate. For two vowel-specific data subsets that were isolated for two open vowels and analyzed with three closed phase approaches, the resulting waveform errors had mean and standard deviation values below 20% and 10%, respectively, of the true glottal source amplitude. These approaches also showed remarkable stability across different voice qualities and subglottal pressure levels. Results of data subset analysis suggest that analysis of close rounded vowels

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 05, 2017
Accession Number
AD1031245

Entities

People

  • Daryush D. Mehta
  • Jon Guonason
  • Matias Zanartu
  • Thomas F. Quatieri
  • Yu-ren Chien

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Accuracy
  • Acoustic Waves
  • Algorithms
  • Data Sets
  • Detection
  • Detectors
  • Electronic Mail
  • Engineering
  • Filters
  • Filtration
  • Frequency
  • Frequency Bands
  • Frequency Response
  • Measurement
  • Shape
  • Simulations
  • Waveforms

Fields of Study

  • Engineering

Readers

  • Computational Modeling and Simulation
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms