Cluster Analysis.

Abstract

Current clustering techniques possess several common features which seem undesirable. For example, a 'cluster' remains an undefined concept; each clustering technique tends to work properly only under unstated, but often restrictive, implied assumptions; and the nonexistence of clustering statistics or the lack of theory about the sampling distributions of the statistics (when they do exist) makes the assessment of the statistical significance of a cluster quite impossible. In this paper after a brief review and critique of the clustering methods that are most widely used, definitions of a cluster and its related concepts are proposed. The clusters so defined and their associated statistics will remain invariant under any monotonic transformation of the elements of the data matrix on which they depend. Their sampling distributions are investigated by analytic and Monte Carlo methods. Both aritificial and real data are employed to illustrate the methodology and probability theory of the proposed clustering method. (Author)

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1971
Accession Number
AD0717333

Entities

People

  • R. F. Ling

Organizations

  • Yale University

Tags

DTIC Thesaurus Topics

  • Clustering
  • Collecting Methods
  • Computing-Related Activities
  • Data Science
  • Information Science
  • Interdisciplinary Science
  • Mathematics
  • Monte Carlo Method
  • Probability
  • Sampling
  • Statistics

Fields of Study

  • Mathematics

Readers

  • Calculus or Mathematical Analysis
  • Regression Analysis.