Towards Interpretive Models for 2-D Processing of Speech
Abstract
Two-dimensional (2-D) processing of speech has recently been explored as an alternative representational framework that explicitly analyzes temporal, spectral, and joint spectrotemporal energy fluctuations or "modulations" present in time-frequency distributions (e.g., in the spectrogram or auditory spectrogram). This paper considers 2-D Fourier analysis of local time-frequency regions of wideband spectrograms, a representation referred to as the (wideband) Grating Compression Transform (WGCT). We develop frequency dependent models of speech signals in the WGCT context related to speech production characteristics, building on previous work in modeling narrowband-based GCT representations. Model evaluation through simulations and error analysis is performed. Comparison shows the model effectiveness, and important distinctions, including "dual" behavior, between the wide and narrowband models. Our results motivate a novel taxonomy of speech signal behavior for use as an interpretative framework (i.e., in relation to speech production characteristics) for 2-D processing of speech using the GCT and potentially other 2-D approaches and time-frequency distributions. We demonstrate the ability of the model to represent real speech content through using demodulation techniques for analysis/synthesis of wideband spectrograms and co-channel speaker separation using prior pitch information.
Document Details
- Document Type
- Technical Report
- Publication Date
- Apr 26, 2011
- Accession Number
- ADA570532
Entities
People
- Thomas F. Quatieri
- Tianyu T. Wang
Organizations
- Massachusetts Institute of Technology