STATISTICAL INFORMATION RETRIEVAL SYSTEM
Abstract
An information retrieval system was developed using technical word occurrences as a basis for classification. A set of words, designated a vocabulary, was selected from the middle range of frequency listing of words occurring in an experimental sample of 94 documents. The selection produced 115 non-function words with technical definition that did not allow ambiguous usage and they were assigned one of eighty concept numbers. The frequencies of these concepts served as data for factor analysis and 39 factors were extracted to represent the orthogonal axes of a geometric subject-content space. The locations of concepts in this space were used to locate the geometric position of documents according to their frequencies in the documents. The total of 194 documents was used in the measuring of system effectiveness. Requests formulated for a previous experiment using the same data base were processed. Precision and recall measures were calculated and on the average 66% precision and 80% recall were attained with one of three dissemination thresholds. Overall analysis of the results supports the theory that statistical data about word occurrences is sufficient to accurately represent documents relative to their subject content.
Document Details
- Document Type
- Technical Report
- Publication Date
- Oct 01, 1969
- Accession Number
- AD0697403
Entities
People
- Nicholas M. Difondi
Organizations
- Rome Laboratory