Beyond Class A: A Proposal for Automatic Evaluation of Discourse
Abstract
The DARPA Spoken Language community has just completed the first trial evaluation of spontaneous query/response pairs in the Air Travel (ATIS) domain. Our goal has been to find a methodology for evaluating correct responses to user queries. To this end, we agreed, for the first trial evaluation, to constrain the problem in several ways: Database Application: Constrain the application to a database query application, to ease the burden of a) constructing the back-end, and b) determining correct responses; Canonical Answer: Constrain answer comparison to a minimal "canonical answer" that imposes the fewest constraints on the form of system response displayed to a user at each site; Typed Input: Constrain the evaluation to typed input only; Class A: Constrain the test set to single unambiguous intelligible utterances taken without context that have well-defined database answers ("class A" sentences). These were reasonable constraints to impose on the first trial evaluation. However, it is clear that we need to loosen these constraints to obtain a more realistic evaluation of spoken language systems. The purpose of this paper is to suggest how we can move beyond evaluation of class A sentences to an evaluation of connected dialogue, including out-of-domain queries.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 1990
- Accession Number
- ADA458704
Entities
People
- Deborah A. Dahl
- Donald P. Mckay
- Lewis M. Norton
- Lynette Hirschman
- Marcia C. Linebarger