Extrinsic Evaluation of Automatic Metrics for Summarization

Abstract

This paper describes extrinsic-task evaluation of summarization. We show that it is possible to save time using summaries for relevance assessment without adversely impacting the degree of accuracy that would be possible with full documents. In addition, we demonstrate that the extrinsic task we have selected exhibits a high degree of interannotator agreement, i.e., consistent relevance decisions across subjects. We also conducted a composite experiment that better reflects the actual document selection process and found that using a surrogate improves the processing speed over reading the entire document. Finally, we have found a small yet statistically significant correlation between some of the intrinsic measures and a user's performance in an extrinsic task. The overall conclusion we can draw at this point is that ROUGE-1 does correlate with precision and to a somewhat lesser degree with accuracy, but that it remains to be investigated how stable these correlations are and how differences in ROUGE-1 translate into significant differences in human performance in an extrinsic task.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jul 20, 2004
Accession Number: ADA448065

Entities

People

Bonnie J. Dorr
Christof Monz
David Zajic
Douglas Oard
Richard Schwartz
Stacy President

Organizations

University of Maryland

Extrinsic Evaluation of Automatic Metrics for Summarization

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers