Bounds on Information Retrieval Efficiency in Static File Structures

Abstract

The research addresses the problem of file organization for efficient information retrieval when each file item may be accessed through any one of a large number of identification keys. The emphasis is on library problems, namely large, low-update, directory-oriented files, but other types of files are discussed. The model used introduces the concept of an ideal directory against which all imperfect real implementations (catalogs) can be compared. The use of an ideal reference point serves to separate language interpretation problems from information organization problems, and permits concentration on the latter. The model includes a probabilistic description of file usage, developed to give precise definition to the range of user requirements. The analysis employs mathematical tools and techniques developed for information theory, such as the entropy measure and the concept of an ensemble of possible file items. The principal analysis variable is item relevance, the probability that a file item accessed is actually useful, which is a measure of retrieval efficiency.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 1971
Accession Number
AD0725429

Entities

People

  • Terry A. Welch

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • C4I

DTIC Thesaurus Topics

  • Algorithms
  • Combinatorial Analysis
  • Computations
  • Computer Programming
  • Computer Programs
  • Databases
  • Digital Information
  • Directories
  • Information Retrieval
  • Information Science
  • Information Theory
  • Language
  • Mathematical Analysis
  • Mathematics
  • Probability
  • Statistics
  • Theorems

Fields of Study

  • Computer science

Readers

  • Business Analytics
  • Parallel and Distributed Computing.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms