Natural Language Text Segmentation Techniques Applied to the Automatic Compilation of Printed Subject Indexes and for Online Database Access,

Abstract

The nature of the problem and earlier approaches to the automatic compilation of printed subject indexes are reviewed and illustrated. A simple method is described for the detection of semantically self-contained word phrase segments in title-like texts. The method is based on a predetermined list of acceptable types of nominative syntactic patterns which can be recognized using a small domain-independent dictionary. The transformation of the detected word phrases into subject index records is described. The records are used for the compilation of Key Word Phrase subject indexes (KWPSI). The method has been successfully tested for the fully automatic production of KWPSI-type indexes to titles of scientific publications. The usage of KWPSI-type display formats for the enhanced online access to databases is also discussed.

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 1983
Accession Number
ADP001180

Entities

People

  • George Vladutz

Organizations

  • VINITI

Tags

DTIC Thesaurus Topics

  • Automatic
  • California
  • Computer Languages
  • Computer Vision
  • Databases
  • Detection
  • Dictionaries
  • Formal Languages
  • Image Processing
  • Language
  • Natural Language Processing
  • Natural Languages
  • Production

Readers

  • Computational Linguistics
  • Library and Information Science