WORD STATISTICS IN THE GENERATION OF SEMANTIC TOOLS FOR INFORMATION SYSTEMS

Abstract

A crucial problem in systems for the storage and retrieval of technical information is the interpretation of words used to index documents. Semantic tools, defined as channels for the communication of word meanings between technical experts, document indexers, and searchers, provide one method of dealing with the problem of multiple interpretations. The report shows how statistical data on the distribution of occurrences of single words or word pairs in the text of a set of documents can be used in generating semantic tools, in particular, an indexing vocabulary and relations among the terms in this vocabulary. An experiment in this area is described, involving the testing of several new statistical measures and techniques. The results give some insight into the patterns of language usage in technical literature and suggest directions for future research.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 1967
Accession Number
AD0664915

Entities

People

  • Don C. Stone

Organizations

  • Moore School of Electrical Engineering

Tags

Communities of Interest

  • Advanced Electronics
  • C4I
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Air Force
  • Computations
  • Computer Programming
  • Computer Programs
  • Computers
  • Data Science
  • Dictionaries
  • Index Terms
  • Information Processing
  • Information Retrieval
  • Information Science
  • Information Systems
  • Language
  • Literature
  • Statistical Data
  • Statistics
  • Vocabulary

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation