Isolated Digit Recognition Without Time Alignment.

Abstract

This thesis examines methods for isolated digit recognition without using time alignment. Resource requirements for isolated word recognizers that use time alignment can become prohibitively large as the vocabulary to be classified grows. Thus, methods capable of achieving recognition rates comparable to those obtained with current methods using these techniques are needed. The goals of this research are to find feature sets for speech recognition that perform well without using time alignment, and to identify classifiers that provide good performance with these features. Using the digits from the TI46 database, baseline speaker-independent recognition rates of 95.2% for the complete speaker set and 98.1% for the male speaker set are established using dynamic time warping (DTW). This work begins with features derived from spectrograms of each digit. Based on a critical band frequency scale covering the telephone bandwidth (300-3000 Hz), these critical band energy features are classified alone and in combination with several other feature sets, with several different classifiers. With this method, there is one "short" feature vector per word. For speaker-independent recognition using the complete speaker set and a multi-layer perceptron (MLP) classifier, a recognition rate of 92.4% is achieved. For the same classifier with the male speaker set, a recognition rate of 97.1% is achieved. For the male speaker set, there is no statistical difference between results using DTW, and those using the MLP and no time alignment. This shows that there are feature sets that may provide high recognition rates for isolated word recognition without the need for time alignment.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 1994
Accession Number
ADA289344

Entities

People

  • Jeffrey M. Gay

Organizations

  • Air Force Institute of Technology

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Automated Speech Recognition
  • Bandwidth
  • Databases
  • Electrical Engineering
  • Feature Extraction
  • Frequency
  • Frequency Bands
  • Hidden Markov Models
  • Machine Learning
  • Markov Models
  • Neural Networks
  • Recognition
  • Signal Processing
  • Three Dimensional
  • Vocabulary
  • Word Recognition

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Speech Processing/Speech Recognition.
  • Systems Analysis and Design

Technology Areas

  • AI & ML