Toward Low-Cost Automated Evaluation Metrics for Internet of Things Dialogues
Abstract
We analyze a corpus of system-user dialogues in the Internet of Things domain. Our corpus is automatically, semi-automatically, and manually annotated with a variety of features both on the utterance level and the full dialogue level. The corpus also includes human ratings of dialogue quality collected via crowd sourcing. We calculate correlations between features and human ratings to identify which features are highly associated with human perceptions about dialogue quality in this domain. We also perform linear regression and derive a variety of dialogue quality evaluation functions. These evaluation functions are then applied to a held-out portion of our corpus, and are shown to be highly predictive of human ratings and outperform standard reward-based evaluation functions.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 2018
- Accession Number
- AD1159898
Entities
People
- Carla Gordon
- David R Traum
- Heesik Jeon
- Hyungtak Choi
- Jill Boberg
- Kallirroi Georgila
Organizations
- Samsung Electronics
- University of Southern California