A Text-Extraction Based Summarizer

Abstract

We present an automated method of generating human-readable summaries from a variety of text documents including newspaper articles, business reports, government documents, even broadcast news transcripts. Our approach exploits an empirical observation that much of the written text display certain regularities of organization and style, which we call the Discourse Macro Structure \201DMS\202. A summary is therefore created to reflect the components of a given DMS. In order to produce a coherent and readable summary we select continuous, well-formed passages from the source document and assemble them into a mini-document within a DMS template. In this paper we describe an automated summarizer that can generate both short indicative abstracts, useful for quick scanning of a list of documents, as well as longer informative digests that can serve as surrogates for the full text. The summarizer can assist the users of an information retrieval system in assessing the quality of the results returned from a search, preparing reports and memos for their customers, and even building more effective search queries.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 01, 1998
Accession Number
ADA631188

Entities

People

  • G. B. Wise
  • Gees C. Stein
  • Tomek Strzalkowski

Tags

Communities of Interest

  • Biomedical
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Abstracts
  • Automated Text Summarization
  • Birds
  • Cognitive Science
  • Computational Linguistics
  • Extraction
  • Governments
  • Information Processing
  • Information Retrieval
  • Language
  • Linguistics
  • Materials
  • Natural Language Processing
  • Natural Languages
  • New York
  • Newspapers
  • Template Patterns

Fields of Study

  • Computer science

Readers

  • Business Analytics
  • Computational Linguistics
  • Database Systems and Applications

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval