OPTIMIZATION AND STANDARDIZATION OF INFORMATION RETRIEVAL LANGUAGE AND SYSTEMS

Abstract

The report analyzes and evaluates methods of organizing data files, primarily for document retrieval applications. Three principal techniques are examined: the Multi-List System, the list-organized file, and the inverted and document-sequenced file. Statistical analyses were made of term associations based on 599 most common DDC descriptors. Results indicate the need of a large amount of processing against an extensive data base; since most documents have almost as many groups as index terms, the postulated reduction in lists traversing a given document cannot be realized. Analysis shows that the list- organized file is an amalgamation of the inverted and document-sequenced files, and that maintenance and use of the two separate files is more efficient when requirements cannot be met by the inverted file alone. A technique for optimizing organization of the two files to minimize actual computing and over- all elapsed processing times is described. It is viewed as dubious that any particular significance can be attached to a unique index term 'association.' There appears potential value in using relationships implicit in the hierarchic structure of a thesaurus, both for processing search requests and to aid in assigning descriptors by such techniques as 'lowest level indexing.'

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 28, 1966
Accession Number
AD0630797

Entities

People

  • Earl G. Fossum
  • Gilbert Kaskey

Organizations

  • Sperry Corporation

Tags

Communities of Interest

  • C4I
  • Space
  • Weapons Technologies

DTIC Thesaurus Topics

  • Access Time
  • Computer Programming
  • Computer Programs
  • Computers
  • Data Science
  • Data Storage Systems
  • Databases
  • Index Terms
  • Information Retrieval
  • Information Science
  • Language
  • Magnetic Tape
  • Mass Storage
  • Plastic Explosives
  • Statistical Analysis
  • Statistics
  • Time Intervals

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Library and Information Science
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval