Superimposed Coding Versus Sequential and Inverted Files.

Abstract

The relative efficiency of three computer search algorithms was compared for searching large bibliographic files with Boolean search strategies. The sequential and inverted files represent the two most common file structures used today for bibliographic searching. Superimposed coding is an alternative that is becoming more attractive as the speed of computers improves. The superimposed search has a key associated with each record in the data base to act as a screen to eliminate the majority of records from further consideration. The keys are based on the bigrams and trigrams contained in the record, and are arranged in a linear file. The sequential search is a character by character scan of the entire file. This search is facilitated by constructing a finite state machine at the beginning of the search to match the search terms. The inverted file is fairly standard, except for the use of bit vectors to hold the postings of very common entries. A data base of 100,000 INSPEC records, from nine months of 1974, was used for testing the algorithms with 339 real-life search questions.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 1977
Accession Number
ADA040685

Entities

People

  • Thomas Butler Hickey

Organizations

  • University of Illinois Urbana–Champaign

Tags

Communities of Interest

  • C4I
  • Cyber
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Abstracts
  • Air Force
  • Algorithms
  • Computer Programming
  • Computer Programs
  • Computer Science
  • Computers
  • Databases
  • Information Processing
  • Information Retrieval
  • Information Science
  • Language
  • Library Science
  • Probability
  • Standards
  • Translations
  • United States

Readers

  • Computer Science/Computer Engineering/Data Science/Digital Signal Processing.
  • Library and Information Science
  • Statistical inference.