Extending Generation and Evaluation and Metrics (GEM) to Grounded Natural Language Generation (NLG) Systems and Evaluating their Descriptive Texts Derived from Image Sequences
Abstract
We present here, for consideration in a future Generation and Evaluation and Metrics (GEM) challenge, a graduated, task-based approach to evaluating grounded natural language generation (NLG) systems that generate descriptive texts derived from sequences of input images. We start by characterizing grounded NLG tasks that generate descriptive texts at increasing levels of complexity, then step through examples of these levels with image sequences and facet targets (input) and their derivative descriptive texts (output) from our human-authored data set. For evaluating whether a grounded NLG system is "good enough" for users' needs, we first ask if the user can recover the images the system used to derive descriptive texts at the relevant, graduated level of complexity. The texts judged as adequate in this image-selection task are then analyzed for their semantic facet units (SFUs), which form the basis for scoring descriptive texts generated by other grounded NLG systems. The image-selection and SFU scoring together constitute the evaluation we are piloting for grounded, data-to-text NLG systems.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 01, 2021
- Accession Number
- AD1149441
Entities
People
- Clare R. Voss
- Stephanie M. Lukin