Subphonetic Acoustic Modeling for Speaker-Independent Continuous Speech Recognition

Abstract

To model the acoustics of a large vocabulary well while staying within a reasonable memory capacity, most speech recognition systems use phonetic models to share parameters across different words in the vocabulary. This dissertation investigates the merits of modeling at the subphonetic level. We demonstrate that sharing parameters at the subphonetic level provides more accurate acoustic models than sharing at the phonetic level. The concept of subphonetic parameter sharing can be applied to any class of parametric models. Since the first-order hidden Markov model (HMM) has been overwhelmingly successful in speech recognition, this dissertation bases all its studies and experiments on HMMs. The subphonetic unit we investigate is the state of phonetic HMMs. We develop a system in which similar Markov states of phonetic models share the same Markov parameters. The shared parameter (i.e., the output distribution) associated with a cluster of similar states is called a senone because of its state dependency. The phonetic models that share senones are shared-distribution models or SDMs. Experiments show that SDMs offer more accurate acoustic models than the generalized-triphone model presented by Lee. Senones are next applied to offer accurate models for triphones not experienced in the system training data. In this dissertation, two approaches for modeling unseen triphones are studied - purely decision-tree based senones and a hybrid approach using the concept of Markov state quantization. Both approaches indeed offer a significant error reduction over the previously accepted approach of monophone model substitution. However, the purely decision-tree based senone approach is preferred for its simplicity.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 17, 1993
Accession Number
ADA275217

Entities

People

  • Mei-yuh Hwang

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • C4I
  • Ground and Sea Platforms
  • Human Systems

DTIC Thesaurus Topics

  • Automated Speech Recognition
  • Computational Complexity
  • Computational Science
  • Computer Science
  • Computers
  • Databases
  • Hidden Markov Models
  • Information Science
  • Markov Models
  • Natural Language Processing
  • Natural Languages
  • Neural Networks
  • Pattern Recognition
  • Probability
  • Probability Distributions
  • Signal Processing
  • Software In The Loop

Readers

  • Computational Modeling and Simulation
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference