Computer Identification of Phonemes in Continuous Speech.

Abstract

The purpose of this investigation was to identify phoneme segments as they appeared in continuous speech. The input device was an audio tape recorder from which the analog speech signal was digitized and fast Fourier transformed. The amplitudes of this transformed signal were combined in a logarithmic manner and printed out in a 16 channel digitized spectrogram. Sixty-one prototypes were selected to represent the phonemes of the English language. These prototypes were stored and used in a running crosscorrelation with the unknown speech signal. The amplitude values resulting from the correlation process were used to predict phoneme locations and the values were compared in order to identify the correct phoneme. The phonemes were selected from Speaker A's speech signal and tests were conducted to analyze utterances from Speaker A and Speaker B. For Speaker A, location was rated at 81 percent while identification was rated at 45 percent. For Speaker B, location was found to be 70 percent with identification at 40 percent. Spatial filtering techniques, uniform length prototypes, and various normalization procedures were investigated next with the result of improving location for Speaker B. (Author)

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 1976
Accession Number
ADA034274

Entities

People

  • William R. Hensley

Organizations

  • Air Force Institute of Technology

Tags

Communities of Interest

  • Energy and Power Technologies
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Audio Tapes
  • Automated Speech Recognition
  • Computer Programs
  • Computers
  • Electrical Engineering
  • English Language
  • Filtration
  • Identification
  • Language
  • Pattern Recognition
  • Plastic Explosives
  • Recording Systems
  • Signal Processing
  • Spatial Filtering
  • Tape Recorders
  • Tape Recording

Readers

  • Computational Modeling and Simulation
  • Speech Processing/Speech Recognition.