A Semi-automated Evaluation Metric for Dialogue Model Coherence
Abstract
We propose a new metric, Voted Appropriateness, which can be used to automatically evaluate dialogue policy decisions, once some wizard data has been collected. We show that this metric outperforms a previously proposed metric Weak agreement. We also present a taxonomy for dialogue model evaluation schemas, and orient our new metric within this taxonomy.
Document Details
- Document Type
- Technical Report
- Publication Date
- Apr 01, 2016
- Accession Number
- AD1157726
Entities
People
- David R Traum
- Sudeep Gandhe
Organizations
- University of Southern California