OPTIMIZATION AND STANDARDIZATION OF INFORMATION RETRIEVAL LANGUAGE AND SYSTEMS
Abstract
The report analyzes and evaluates methods of organizing data files, primarily for document retrieval applications. Three principal techniques are examined: the Multi-List System, the list-organized file, and the inverted and document-sequenced file. Statistical analyses were made of term associations based on 599 most common DDC descriptors. Results indicate the need of a large amount of processing against an extensive data base; since most documents have almost as many groups as index terms, the postulated reduction in lists traversing a given document cannot be realized. Analysis shows that the list- organized file is an amalgamation of the inverted and document-sequenced files, and that maintenance and use of the two separate files is more efficient when requirements cannot be met by the inverted file alone. A technique for optimizing organization of the two files to minimize actual computing and over- all elapsed processing times is described. It is viewed as dubious that any particular significance can be attached to a unique index term 'association.' There appears potential value in using relationships implicit in the hierarchic structure of a thesaurus, both for processing search requests and to aid in assigning descriptors by such techniques as 'lowest level indexing.'
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 28, 1966
- Accession Number
- AD0630797
Entities
People
- Earl G. Fossum
- Gilbert Kaskey
Organizations
- Sperry Corporation