STATISTICAL SEMANTICS,

Abstract

Three small libraries in physics, in European current events, and in information retrieval are represented by three groups of 100 lists, each list of which simulates output of a computer program which determines the 12 most frequent content words of a document. Homographs of words which occur in any two of the three libraries are inventoried to ascertain how cleanly the homographs are separated as a consequence of separating the libraries from each other. Three kinds of homograph separation are specified--doubtful, partial, and clean-cut. The latter was found to predominate in this study, as a result of the variegation and small size of the libraries. It is hypothesized that for statistically separable libraries somewhat closer in subject matter and/or larger, lower percentages of clean-cut separations should occur, but that there are countertrends which could make these effects less important. (Author)

Document Details

Document Type
Technical Report
Publication Date
Jul 11, 1962
Accession Number
AD0281909

Entities

People

  • Lauren B. Doyle

Organizations

  • System Development Corporation

Tags

DTIC Thesaurus Topics

  • Computational Processes
  • Computer Program Documentation
  • Computer Programs
  • Computers
  • Computing Devices
  • Computing-Related Activities
  • Information Retrieval
  • Semantics

Readers

  • Computer Science.
  • Linear Algebra
  • Organic Chemistry

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Information Retrieval
  • AI & ML - Machine Translation