The Use of Selected Portions of Technical Documents as Sources of Index Terms and Effect on Input Costs and Retrieval Effectiveness.

Abstract

Recall (the retrieval of all available relevant documents) should decrease with the quantity of text serving as a source of indexing. However, the time for indexing and therefore the input cost should be less, establishing a tradeoff between input cost and retrieval effectiveness. To quantify the effect of restricting the source text on both retrieval effectiveness and input cost, an experiment was designed in which the full technical document text was divided into five categories: title; abstract; table of contents and lists of figures and tables; author-assigned keywords; and the body. An experimental data base was prepared whereby the index term source category and the indexing time were recorded. Sets of SDI and retrospective searches were run against the data base, and retrievals were analyzed by category. For the subset of documents retrieved, 81% of the available relevant documents were retrieved from Categories 1-4; the indexing time required for these four categories was only 53% of the total indexing time. For the entire set of documents input into the experimental data base, the portion of indexing time for the first four categories was 60%. It was decided that the body of the document could be excluded as a source of index terms. (Modified author abstract)

Document Details

Document Type
Technical Report
Publication Date
Apr 01, 1973
Accession Number
AD0761808

Entities

People

  • F. L. Scheffler
  • H. H. Schumacher
  • J. F. March

Organizations

  • University of Dayton Research Institute

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Databases
  • Experimental Data
  • Index Terms
  • Indexes

Readers

  • Computational Modeling and Simulation
  • Library and Information Science
  • Systems Analysis and Design