AN APPLICATION OF CLUSTER DETECTION TO TEXT AND PICTURE PROCESSING.

Abstract

'Syntactic' information about a corpus of linguistic or pictorial data can be 'discovered' by analyzing the statistics of the data. Given a corpus of text, one can measure the tendencies of pairs of words to occur in common contexts, and use these measurements to define clusters of words. Applied to Basic English text, this procedure yields clusters which correspond very closely to the traditional 'parts of speech' (nouns, verbs, articles, etc.). For FORTRAN 'text', the clusters obtained correspond to integers, operations, etc.; for English text regarded as a sequence of letters (or of phonemes) rather than words, the vowels and the consonants are obtained as clusters. Finally, applied to the gray shades in a digitized picture, the procedure yields slice levels which appear to be useful for figure extraction. (Author)

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 1968
Accession Number
AD0670612

Entities

People

  • Azriel Rosenfeld
  • Han K. Huang
  • Victor H. Schneider

Organizations

  • University of Maryland

Tags

DTIC Thesaurus Topics

  • Consonants
  • Data Science
  • Detection
  • Extraction
  • Information Processing
  • Information Science
  • Mathematics
  • Measurement
  • Phonemes
  • Sequences
  • Statistics

Readers

  • Computational Linguistics
  • Graph Algorithms and Convex Optimization.
  • Speech Processing/Speech Recognition.