Beyond Class A: A Proposal for Automatic Evaluation of Discourse

Abstract

The DARPA Spoken Language community has just completed the first trial evaluation of spontaneous query/response pairs in the Air Travel (ATIS) domain. Our goal has been to find a methodology for evaluating correct responses to user queries. To this end, we agreed, for the first trial evaluation, to constrain the problem in several ways: Database Application: Constrain the application to a database query application, to ease the burden of a) constructing the back-end, and b) determining correct responses; Canonical Answer: Constrain answer comparison to a minimal "canonical answer" that imposes the fewest constraints on the form of system response displayed to a user at each site; Typed Input: Constrain the evaluation to typed input only; Class A: Constrain the test set to single unambiguous intelligible utterances taken without context that have well-defined database answers ("class A" sentences). These were reasonable constraints to impose on the first trial evaluation. However, it is clear that we need to loosen these constraints to obtain a more realistic evaluation of spoken language systems. The purpose of this paper is to suggest how we can move beyond evaluation of class A sentences to an evaluation of connected dialogue, including out-of-domain queries.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jan 01, 1990
Accession Number: ADA458704

Entities

People

Deborah A. Dahl
Donald P. Mckay
Lewis M. Norton
Lynette Hirschman
Marcia C. Linebarger

Beyond Class A: A Proposal for Automatic Evaluation of Discourse

Abstract

Document Details

Entities

People

Tags

DTIC Thesaurus Topics

Fields of Study

Readers