Scientific and Technical Information. Series 2. Information Processes and Systems. Number 6, 1969 (Selected Portions),

Abstract

The methods and results of a study to develop an IR language for automatic systems handling polytechnical documents are described. The descriptor dictionary includes both general and special terms expressed by words and phrases which contributes to better recall and precision figures; it comprises a classified and a lexico-semantic index as well as generic and specific relations tables. The dictionary size is 5,542 descriptors and 3,073 keywords. The document indexing procedure includes the following steps: document content analysis and description by natural words; forming of the search pattern by using the descriptor dictionary. The techniques are described which are applied to the analysis of a document from different conceptual aspects constituting the elements of a formalized model of its condensed content. Conversion into IR language involves the use of the words both from the title and the text of a document. The essentials of the technique used for evaluation of the retrieval efficiency by applying statistical methods are set forth. Tests on a multi-subject collection revealed the possibility of a system operating at 85 percent recall and 7 percent relevance, with a standard deviation of 25 percent. (Author)

Document Details

Document Type
Technical Report
Publication Date
Feb 17, 1971
Accession Number
AD0724977

Entities

People

  • Yu. I. Shemakin

Organizations

  • National Air and Space Intelligence Center

Tags

DTIC Thesaurus Topics

  • Automatic
  • Conversion
  • Data Science
  • Dictionaries
  • Efficiency
  • Information Science
  • Language
  • Precision
  • Standards
  • Test And Evaluation
  • Words (Language)

Fields of Study

  • Education

Readers

  • Business Analytics
  • Computational Linguistics
  • Regression Analysis.