WORD STATISTICS IN THE GENERATION OF SEMANTIC TOOLS FOR INFORMATION SYSTEMS
Abstract
A crucial problem in systems for the storage and retrieval of technical information is the interpretation of words used to index documents. Semantic tools, defined as channels for the communication of word meanings between technical experts, document indexers, and searchers, provide one method of dealing with the problem of multiple interpretations. The report shows how statistical data on the distribution of occurrences of single words or word pairs in the text of a set of documents can be used in generating semantic tools, in particular, an indexing vocabulary and relations among the terms in this vocabulary. An experiment in this area is described, involving the testing of several new statistical measures and techniques. The results give some insight into the patterns of language usage in technical literature and suggest directions for future research.
Document Details
- Document Type
- Technical Report
- Publication Date
- Dec 01, 1967
- Accession Number
- AD0664915
Entities
People
- Don C. Stone
Organizations
- Moore School of Electrical Engineering