STATISTICAL SEMANTICS,
Abstract
Three small libraries in physics, in European current events, and in information retrieval are represented by three groups of 100 lists, each list of which simulates output of a computer program which determines the 12 most frequent content words of a document. Homographs of words which occur in any two of the three libraries are inventoried to ascertain how cleanly the homographs are separated as a consequence of separating the libraries from each other. Three kinds of homograph separation are specified--doubtful, partial, and clean-cut. The latter was found to predominate in this study, as a result of the variegation and small size of the libraries. It is hypothesized that for statistically separable libraries somewhat closer in subject matter and/or larger, lower percentages of clean-cut separations should occur, but that there are countertrends which could make these effects less important. (Author)
Document Details
- Document Type
- Technical Report
- Publication Date
- Jul 11, 1962
- Accession Number
- AD0281909
Entities
People
- Lauren B. Doyle
Organizations
- System Development Corporation