INFORMATION STORAGE AND RETRIEVAL

Abstract

Some experimental procedures for the identification of information content, using word frequency counting techniques are described. An attempt is made in particular to determine those areas in a natural language text which contain more than an average amount of new information. The use of tree structures for the representation of relations between terms included in classification systems, and between words in the natural language is discussed. Procedures are also suggested for the automatic identification of structural relations, and for the use of trees to perform the matching process. Some problems connected with the use of syntactic analysis for the identification of document content, and various strategies which appear useful for the processing of structured information are discussed. Methods are described for the efficient representation of tree structures in computer storage. Programs are also exhibited to perform a variety of information retrieval operations. (Author)

Document Details

Document Type
Technical Report
Publication Date
Nov 30, 1961
Accession Number
AD0274816

Entities

People

  • Gerard Salton

Organizations

  • Harvard University

Tags

DTIC Thesaurus Topics

  • Automatic
  • Classification
  • Computers
  • Frequency
  • Identification
  • Information Retrieval
  • Language
  • Natural Languages
  • Words (Language)

Readers

  • Artificial Intelligence
  • Computer Science.
  • Regression Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation