Rapid Pre-Indexing by Machine

Abstract

The report describes the development of a new method of subject indexing by machine for documents in the Project INTREX catalog. The purpose of the system is to allow new documents to be placed online quickly in the computer-stored Intres catalog. The system that is developed makes use of human-generated subject terms of existing Intrex documents as a basis for generating index terms for new documents. The pre-indexing system operates on only the title and abstract of a document in generating a pre-index for the document. The analysis of documents already containing human-generated subject indexes consisted of comparing the titles and abstracts of the documents to their subject indexes. A large dictionary with data about word usage was obtained from these comparisons. The dictionary served as a guide for the later pre-indexing of new documents. Three variations of the automatic pre-indexing method were developed, tested, and evaluated. Two methods show promise for operational use in the Intrex system.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 1968
Accession Number
AD0735502

Entities

People

  • William R. Kampe Ii

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • C4I

DTIC Thesaurus Topics

  • Abstracts
  • Computer Science
  • Computers
  • Databases
  • Dictionaries
  • Electrical Engineering
  • Engineering
  • Frequency
  • Index Terms
  • Indexes
  • Information Transfer
  • Language
  • Materials
  • Materials Science
  • Rdx
  • Standards
  • Subject Indexing

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Computer Science/Computer Engineering/Data Science/Digital Signal Processing.
  • Library and Information Science