Efficient Algorithms for Speech Recognition.

Abstract

Advances in speech technology and computing power have created a surge of interest in the practical application of speech recognition. However, the most accurate speech recognition systems in the research world are still far too slow and expensive to be used in practical, large vocabulary continuous speech applications. Their main goal has been recognition accuracy, with emphasis on acoustic and language modelling. But practical speech recognition also requires the computation to be carried out in real time within the limited resources CPU power and memory size of commonly available computers. There has been relatively little work in this direction while preserving the accuracy of research systems. In this thesis, we focus on efficient and accurate speech recognition. It is easy to improve recognition speed and reduce memory requirements by trading away accuracy, for example by greater pruning, and using simpler acoustic and language models. It is much harder to improve both the recognition speed and reduce main memory size while preserving the accuracy. This thesis presents several techniques for improving the overall performance of the CMU Sphinx-II system. Sphinx-II employs semi-continuous hidden Markov models for acoustics and trigram language models, and is one of the premier research systems of its kind. The techniques in this thesis are validated on several widely used benchmark test sets using two vocabulary sizes of about 20K and 58K words. The main contributions of this thesis are an 8-fold speedup and 4-fold memory size reduction over the baseline Sphinx-II system. The improvement in speed is obtained from the following techniques: lexical tree search, phonetic fast match heuristic, and global best path search of the word lattice.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 15, 1996
Accession Number
ADA310308

Entities

People

  • Mosur K. Ravishankar

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • C4I

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Automated Speech Recognition
  • Computer Programming
  • Computers
  • Decoding
  • Grammars
  • Hidden Markov Models
  • Language
  • Markov Models
  • Multithreading
  • Neural Networks
  • Operating Systems
  • Probability
  • Reliability
  • Signal Processing
  • Test Sets

Fields of Study

  • Computer science
  • Engineering

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Computational Modeling and Simulation
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks