2-D Processing of Speech with Application to Pitch and Formant Estimation

Abstract

The grating compression transform (GCT) maps harmonically-related signal components to a concentrated entity in a spatial 2-D frequency plane * The GCT forms the basis of a pitch estimator that uses the radial distance to the largest peak of the GCT * The resulting pitch estimator appears robust under noise conditions and amenable to extension to two-speaker pitch estimation * The GCT forms the basis of a formant estimator that exploits separability of speech source and vocal tract information via changing pitch * Although the spectrogram provides a useful starting point for the GCT, alternate transforms can provide improved performance * Fan-chirp transform is one possibility * Possible GCT directions * Alternate time-frequency distributions * Pitch estimation Extended evaluation to a larger corpus and use of voiced/unvoiced speech Two-speaker pitch estimation * Formant estimation in noise * GCT as model of auditory cortical processing (Sthamma, Ezzat, and Poggio)

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 10, 2007
Accession Number
ADA522033

Entities

People

  • Thomas F. Quatieri
  • Tianyu T. Wang

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Air Platforms

DTIC Thesaurus Topics

  • Department Of Defense
  • Estimators
  • Feature Extraction
  • Frequency
  • Frequency Modulation
  • Gaussian Noise
  • Geometry
  • Image Processing
  • Modulation
  • Narrowband
  • New York
  • Noise
  • Signal Processing
  • Sine Waves
  • Two Dimensional
  • United States
  • United States Government

Fields of Study

  • Engineering

Readers

  • Archaeological Resource Survey
  • Image Processing and Computer Vision.
  • Statistical inference.