Bounds on Information Retrieval Efficiency in Static File Structures
Abstract
The research addresses the problem of file organization for efficient information retrieval when each file item may be accessed through any one of a large number of identification keys. The emphasis is on library problems, namely large, low-update, directory-oriented files, but other types of files are discussed. The model used introduces the concept of an ideal directory against which all imperfect real implementations (catalogs) can be compared. The use of an ideal reference point serves to separate language interpretation problems from information organization problems, and permits concentration on the latter. The model includes a probabilistic description of file usage, developed to give precise definition to the range of user requirements. The analysis employs mathematical tools and techniques developed for information theory, such as the entropy measure and the concept of an ensemble of possible file items. The principal analysis variable is item relevance, the probability that a file item accessed is actually useful, which is a measure of retrieval efficiency.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jun 01, 1971
- Accession Number
- AD0725429
Entities
People
- Terry A. Welch
Organizations
- Massachusetts Institute of Technology