FACP Speech Recognition/Transmission System.
Abstract
This report describes a phoneme vocoder capable of transmitting compressed speech data over bandlimited communication channels at rates lower than 200 bits per second. Using linear prediction analysis for parameter extraction, and sophisticated segmentation and labeling techniques, the vocoder analyzer codes the incoming speech signal into a sequence of discrete sound units, or phonemes. At the receiving end of the channel, the phoneme sequence is input to a digital speech synthesizer. An area function dyad synthesis procedure is described which is based on an area function model representing the vocal tract as a set of 14-cross-sectional areas. Area functions representing phoneme steady states (nuclei) and transitions between any ordered pair of phonemes (dyads) are stored in a dyad table. Given an input phoneme string, the synthesizer selects the corresponding sequence of nuclei and transitions and interpolates between each of the 14 cross-sectional areas, producing a model of the shape of the vocal tract changing in time. The filtering process of this vocal tract model is identical to the optimum inverse filter of linear prediction analysis, allowing direct conversion to linear predictive coding (LPC) synthesis. A terminal analog synthesizer is also described. Diagnostic Rhyme Tests (DRT) of vocoder performance for two male speakers yielded scores of 70.6% using area function dyad synthesis and 83.5% for terminal analog synthesis. (Author)
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 01, 1978
- Accession Number
- ADA060115
Entities
People
- B. T. Oshika
Organizations
- System Development Corporation