An Overview of Technology for Spoken Interaction with Machines (Une Introduction a la Communication Vocale avec les Machines).

Abstract

This report provides a non-mathematical introduction to speech input and output technology. It is divided into three parts. The first presents necessary background information on speech: on its nature, its production and perception, and on methods of analysis and coding used in speech I/O. A central message is that our subjective impression of speech is misleading and causes us to underestimate the complexity of speech communication. The second part is concerned with speech output and discusses the trade-offs that must be made between the quality and flexibility of the speech generated and the complexity and storage requirements of the speech output system. The final - and longest - part of the report deals with speech recognition. Arguments are presented in favor of statistical rather than rule-based approaches to speech recognition. The categories of recognizer currently available and the algorithms they use are briefly described, with the general conclusion that the performance obtained depends critically on the training process: on the type and quantity of the training material and on the amount of information derived from it. Three more detailed sections cover spectral representations and distance measures, the particular set of representations classed as auditory models, and techniques for handling noise and distortions. The last section discusses the difficulties of specifying recognizer performance, and recommends that all performance measurements should be treated with circumspection.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 1988
Accession Number
ADA194153

Entities

People

  • M. J. Hunt

Organizations

  • National Research Council Canada

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Automated Speech Recognition
  • Coding
  • Computational Science
  • Computer Programming
  • Databases
  • Decoding
  • Expert Systems
  • Intelligibility
  • Language
  • Larynx
  • Neural Networks
  • Pattern Recognition
  • Probability
  • Recognition
  • Speech Analysis

Readers

  • Computational Linguistics
  • Systems Analysis and Design
  • Theoretical Analysis.

Technology Areas

  • AI & ML