A Noisy-Channel Model for Document Compression

Abstract

We present a document compression system that uses a hierarchical noisy-channel model of text production. Our compression system first automatically derives the syntactic structure of each sentence and the overall discourse structure of the text given as input. The system then uses a statistical hierarchical model of text production in order to drop non-important syntactic and discourse constituents so as to generate coherent, grammatical document compressions of arbitrary length. The system outperforms both a baseline and a sentence-based compression system that operates by simplifying sequentially all sentences in a text. Our results support the claim that discourse knowledge plays an important role in document summarization.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2002
Accession Number
ADA459360

Entities

People

  • Daniel Marcu
  • Hal Daume Iii

Organizations

  • University of Southern California

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Automated Text Summarization
  • Channel Models
  • Compression
  • Computational Linguistics
  • Contrast
  • Elections
  • Information Operations
  • Information Science
  • Linguistics
  • Probability
  • Production
  • Test And Evaluation

Fields of Study

  • Computer science

Readers

  • Business Analytics
  • Computer Vision.
  • Systems Analysis and Design