Beyond Class A: A Proposal for Automatic Evaluation of Discourse

Abstract

The DARPA Spoken Language community has just completed the first trial evaluation of spontaneous query/response pairs in the Air Travel (ATIS) domain. Our goal has been to find a methodology for evaluating correct responses to user queries. To this end, we agreed, for the first trial evaluation, to constrain the problem in several ways: Database Application: Constrain the application to a database query application, to ease the burden of a) constructing the back-end, and b) determining correct responses; Canonical Answer: Constrain answer comparison to a minimal "canonical answer" that imposes the fewest constraints on the form of system response displayed to a user at each site; Typed Input: Constrain the evaluation to typed input only; Class A: Constrain the test set to single unambiguous intelligible utterances taken without context that have well-defined database answers ("class A" sentences). These were reasonable constraints to impose on the first trial evaluation. However, it is clear that we need to loosen these constraints to obtain a more realistic evaluation of spoken language systems. The purpose of this paper is to suggest how we can move beyond evaluation of class A sentences to an evaluation of connected dialogue, including out-of-domain queries.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1990
Accession Number
ADA458704

Entities

People

  • Deborah A. Dahl
  • Donald P. Mckay
  • Lewis M. Norton
  • Lynette Hirschman
  • Marcia C. Linebarger

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Automatic
  • Classification
  • Databases
  • Defense Systems
  • Expert Systems
  • Information Operations
  • Information Systems
  • Language
  • Military Research
  • Standards
  • Test And Evaluation
  • Test Sets
  • Training

Fields of Study

  • Computer science

Readers

  • Regression Analysis.
  • Speech Processing/Speech Recognition.
  • Systems Analysis and Design