Towards Interpretive Models for 2-D Processing of Speech

Abstract

Two-dimensional (2-D) processing of speech has recently been explored as an alternative representational framework that explicitly analyzes temporal, spectral, and joint spectrotemporal energy fluctuations or "modulations" present in time-frequency distributions (e.g., in the spectrogram or auditory spectrogram). This paper considers 2-D Fourier analysis of local time-frequency regions of wideband spectrograms, a representation referred to as the (wideband) Grating Compression Transform (WGCT). We develop frequency dependent models of speech signals in the WGCT context related to speech production characteristics, building on previous work in modeling narrowband-based GCT representations. Model evaluation through simulations and error analysis is performed. Comparison shows the model effectiveness, and important distinctions, including "dual" behavior, between the wide and narrowband models. Our results motivate a novel taxonomy of speech signal behavior for use as an interpretative framework (i.e., in relation to speech production characteristics) for 2-D processing of speech using the GCT and potentially other 2-D approaches and time-frequency distributions. We demonstrate the ability of the model to represent real speech content through using demodulation techniques for analysis/synthesis of wideband spectrograms and co-channel speaker separation using prior pitch information.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Apr 26, 2011
Accession Number
ADA570532

Entities

People

  • Thomas F. Quatieri
  • Tianyu T. Wang

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Air Force
  • Automated Speech Recognition
  • Bandwidth
  • Computer Science
  • Electrical Engineering
  • Filtration
  • Fourier Analysis
  • Frequency
  • Frequency Bands
  • Language
  • Modulation
  • Narrowband
  • Power Spectra
  • Signal Processing
  • Simulations
  • Speech
  • Two Dimensional

Fields of Study

  • Engineering

Readers

  • Artificial Intelligence
  • Image Processing and Computer Vision.
  • Speech Processing/Speech Recognition.