2-D Processing of Speech with Application to Pitch and Formant Estimation
Abstract
The grating compression transform (GCT) maps harmonically-related signal components to a concentrated entity in a spatial 2-D frequency plane * The GCT forms the basis of a pitch estimator that uses the radial distance to the largest peak of the GCT * The resulting pitch estimator appears robust under noise conditions and amenable to extension to two-speaker pitch estimation * The GCT forms the basis of a formant estimator that exploits separability of speech source and vocal tract information via changing pitch * Although the spectrogram provides a useful starting point for the GCT, alternate transforms can provide improved performance * Fan-chirp transform is one possibility * Possible GCT directions * Alternate time-frequency distributions * Pitch estimation Extended evaluation to a larger corpus and use of voiced/unvoiced speech Two-speaker pitch estimation * Formant estimation in noise * GCT as model of auditory cortical processing (Sthamma, Ezzat, and Poggio)
Document Details
- Document Type
- Technical Report
- Publication Date
- Nov 10, 2007
- Accession Number
- ADA522033
Entities
People
- Thomas F. Quatieri
- Tianyu T. Wang
Organizations
- Massachusetts Institute of Technology