SEARCH PROCEDURES BASED ON MEASURES OF RELATEDNESS BETWEEN DOCUMENTS

Abstract

A new type of information retrieval system is suggested which utilizes data of the type generated by users of the system instead of data generated by indexers. The theoretical model on which the system is based consists of three basic elements. The first element is a measure of the relatedness between document-pairs. It is derived from information theory. The second element is a definition of what constitutes a set (cluster) of inter- related documents. This definition is based on the measure of relatedness. The last element is a procedure which transforms a request for information into a cluster of answer documents. An experimental system was developed to test the model in a realistic environment. It was programmed for the Project MAC time- sharing system and utilized the physics data file of the Technical Information Project. Citations were used as the data base for the measure of relatedness. A file structure and retrieval language were designed which allowed close man- machine coupling. Retrieval efficiency compared to known sets was 60 - 90 percent, and ways of improving this further are suggested.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 1966
Accession Number
AD0636275

Entities

People

  • Evan L. Ivie

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Air Force
  • Algorithms
  • Computer Programming
  • Core Storage
  • Crystal Structure
  • Data Storage Systems
  • Databases
  • Electrical Engineering
  • Engineering
  • Frequency
  • Information Retrieval
  • Information Theory
  • Machine Learning
  • Optical Properties
  • Physics
  • Standards
  • Three Dimensional

Readers

  • Library and Information Science
  • Neural Network Machine Learning.
  • Regression Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Information Retrieval