Syntactic Simplification for Improving Content Selection in Multi-Document Summarization

Abstract

In this paper, we explore the use of automatic syntactic simplification for improving content selection in multi-document summarization. In particular, we show how simplifying parentheticals by removing relative clauses and appositives results in improved sentence clustering, by forcing clustering based on central rather than background information. We argue that the inclusion of parenthetical information in a summary is a reference-generation task rather than a content-selection one, and implement a baseline reference rewriting module. We perform our evaluations on the test sets from the 2003 and 2004 Document Understanding Conference and report that simplifying parentheticals results in significant cant improvement on the automated evaluation metric Rouge. Syntactic simplification is an NLP task, the goal of which is to rewrite sentences to reduce their grammatical complexity while preserving their meaning and information content. Text simplification is a useful task for varied reasons. Chandrasekar et al. (1996) viewed text simplification as a preprocessing tool to improve the performance of their parser. The PSET project (Carroll et al., 1999), on the other hand, focused its research on simplifying newspaper text for aphasics, who have trouble with long sentences and complicated grammatical constructs. We have previously (Siddharthan, 2002; Siddharthan, 2003) developed a shallow and robust syntactic simplification system for news reports, that simplifies relative clauses, apposition and conjunction. In this paper, we explore the use of syntactic simplification in multi-document summarization.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2004
Accession Number
ADA457833

Entities

People

  • Advaith Siddharthan
  • Ani Nenkova
  • Kathleen Mckeown

Organizations

  • Columbia University

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Applied Computer Science
  • Automated Text Summarization
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Computer Science
  • Governments
  • Information Retrieval
  • Information Science
  • Language
  • Linguistics
  • Natural Language Processing
  • Natural Languages
  • Political Science
  • Terrorists

Fields of Study

  • Computer science

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Computational Linguistics