Speech Coding and Phoneme Classification Using a Back-Propagation Neural Network
Abstract
Speech is a natural, unspecialized method of communication that is perhaps the ultimate machine interface. Previous attempts to provide such an interface, however, have been limited to pre-defined vocabularies of an artificial syntax. This paper presents a method for speaker-dependent speech identification that uses a back-propagation neural network to determine the phonemes present within a voice signal. The vocal tract changes slowly in time and can be modeled using the pitch and formant frequencies of the voice. These frequencies relate the position of the vocal tract to specific pronunciations and are obtained by using a homomorphic filtering process that separates the vocal tract's impulse response from the excitation source. The frequency representation of this response is concatenated with the excitation containing the pitch frequency and applied to the input layer of the neural network. From this information, the network selects combinations of features that identify the phonemes which are present. This network was trained on a set of speaker dependent phonemes, and now phonetically classifies new speech input. This classification scheme could be used to translate linguistic messages into machine code with a very high data rate. This benefit would allow for real-time interaction with machines with no specialized training. Applications could be as simple as providing quick voice to text processing or as diverse as implementing a control system with response time tied to specified voice patterns.
Document Details
- Document Type
- Technical Report
- Publication Date
- May 07, 1997
- Accession Number
- ADA418472
Entities
People
- Brett A. St. George
Organizations
- United States Naval Academy