Word and Subword Modelling in a Segment-Based HMM Word Spotter Using a Data Analytic Approach

Abstract

In this work we focus on methods for representing acoustic-phonetic knowledge in a speech recognizer and for analyzing the system's behavior in detail. The testbed for developing these methods is a segment-based hidden Markov model (HMM) recognizer. In this system, measurements are made on variable-duration segments. Ideally, each segment is associated with a single phonetic unit, which we refer to as a phone. The scheme has several potential advantages over the typical HMM recognizer, which is based on fixed-duration frames. They include a greater ability to model statistical dependence among spectral measurements, a more convenient framework for representing acoustic- phonetic knowledge, and a potential reduction in computation since the mean segment rate in our implementation is 1/5 of a typical frame rate. The HMM framework is used to model the segmenter's deviations from the ideal behavior of one segment per phone. We employ an HMM topology that allows a phone to be associated with more than one segment. Biphone HMM's model instances in which a segment is associated with more than one phone. We compared the effectiveness of various segment measurement sets on a phonetic recognition task. The measurements consisted of short-time spectral representations measured at particular positions relative to segment boundaries. The key result was that the addition of spectra measured outside the segment to those measured inside led to a significant improvement in performance. For the task of recognizing 39 phone labels, the best system attained a phonetic accuracy (% correct - % insertions) of 59% (95% confidence interval of 53-65%) on a set of nine male speakers from the VOYAGER corpus, result in the range of those previously reported for recognizers of comparable complexity.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 1992
Accession Number
ADA256820

Entities

People

  • Jeffrey N. Marcus

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Energy and Power Technologies
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Accuracy
  • Computational Science
  • Computers
  • Data Mining
  • Data Science
  • Databases
  • Dimensionality Reduction
  • Electrical Engineering
  • Factor Analysis
  • Information Processing
  • Information Retrieval
  • Information Science
  • Knowledge Management
  • Markov Models
  • Natural Languages
  • Network Science
  • Neural Networks

Readers

  • Speech Processing/Speech Recognition.
  • Systems Analysis and Design