QUANTIFICATION OF INFORMATION STORAGE AND RETRIEVAL METHODOLOGIES

Abstract

The paper presents the results of a set of Monte Carlo computations designed to show the general behavior of the efficiency of probabilistic information retrieval systems as a function of human-variability noise. The total amount of noise, the combination of noise produced in indexing documents and in formulating requests, is the independent variable. The effect of noise is measured by the fraction of the file that must be retrieved in order to obtain the document that in the absence of noise would be retrieved first. Computations are made for an idealized system in which the index and request vectors are normalized and have uniform distributions; however, the method could accommodate other distributions. The results show how, for a fixed amount of noise, the depth of file search decreases with increasing numbers of index categories for each constant ratio of terms specified in the query to index categories in the space. Also, for a fixed number of index categories, the way in which the fraction of file searched decreases with the number of index terms in the query is shown.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 05, 1970
Accession Number
AD0712134

Entities

People

  • Andrew Noetzel
  • Morris Plotkin
  • Samuel D. Epstein

Tags

DTIC Thesaurus Topics

  • Boundaries
  • Computations
  • Contracts
  • Databases
  • Distribution Functions
  • Efficiency
  • Index Terms
  • Indexes
  • Information Retrieval
  • Mathematics
  • Military Research
  • Numbers
  • Probabilistic Models
  • Probability
  • Probability Distributions
  • Security
  • Vector Spaces

Readers

  • Computational Modeling and Simulation
  • Library and Information Science

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Information Retrieval
  • Space