STATISTICAL INFORMATION RETRIEVAL SYSTEM

Abstract

An information retrieval system was developed using technical word occurrences as a basis for classification. A set of words, designated a vocabulary, was selected from the middle range of frequency listing of words occurring in an experimental sample of 94 documents. The selection produced 115 non-function words with technical definition that did not allow ambiguous usage and they were assigned one of eighty concept numbers. The frequencies of these concepts served as data for factor analysis and 39 factors were extracted to represent the orthogonal axes of a geometric subject-content space. The locations of concepts in this space were used to locate the geometric position of documents according to their frequencies in the documents. The total of 194 documents was used in the measuring of system effectiveness. Requests formulated for a previous experiment using the same data base were processed. Precision and recall measures were calculated and on the average 66% precision and 80% recall were attained with one of three dissemination thresholds. Overall analysis of the results supports the theory that statistical data about word occurrences is sufficient to accurately represent documents relative to their subject content.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 01, 1969
Accession Number
AD0697403

Entities

People

  • Nicholas M. Difondi

Organizations

  • Rome Laboratory

Tags

Communities of Interest

  • C4I
  • Human Systems

DTIC Thesaurus Topics

  • Accuracy
  • Classification
  • Data Science
  • Databases
  • Errors
  • Factor Analysis
  • Frequency
  • Information Processing
  • Information Retrieval
  • Information Science
  • New York
  • Precision
  • Probability
  • Statistical Analysis
  • Statistical Data
  • Statistics
  • Vocabulary

Readers

  • Library and Information Science
  • Regression Analysis.
  • Speech Processing/Speech Recognition.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Information Retrieval
  • Space