SEARCH PROCEDURES BASED ON MEASURES OF RELATEDNESS BETWEEN DOCUMENTS
Abstract
A new type of information retrieval system is suggested which utilizes data of the type generated by users of the system instead of data generated by indexers. The theoretical model on which the system is based consists of three basic elements. The first element is a measure of the relatedness between document-pairs. It is derived from information theory. The second element is a definition of what constitutes a set (cluster) of inter- related documents. This definition is based on the measure of relatedness. The last element is a procedure which transforms a request for information into a cluster of answer documents. An experimental system was developed to test the model in a realistic environment. It was programmed for the Project MAC time- sharing system and utilized the physics data file of the Technical Information Project. Citations were used as the data base for the measure of relatedness. A file structure and retrieval language were designed which allowed close man- machine coupling. Retrieval efficiency compared to known sets was 60 - 90 percent, and ways of improving this further are suggested.
Document Details
- Document Type
- Technical Report
- Publication Date
- May 01, 1966
- Accession Number
- AD0636275
Entities
People
- Evan L. Ivie
Organizations
- Massachusetts Institute of Technology