Tuning a Major Part of a Clustering Algorithm.

Abstract

Most proposals for clustering algorithms have been based on introspection Few proposed algorithms have had their performance studied. This approach involves (a) striving to avoid comparing distances on remote parts of the data (because metrics deserve only minimum trust), and (b) using a stochastically defined test bed to measure, and where possible understand, the performance of an evolving algorithm, with the intent of using our understanding to modify it in such a way as to improve its performance. This test bed involves 3 circular Gaussian samples, of size 50 each, centered at the vertices of an equilateral triangle of side t(sigma). Its use assumes that a 3-group answer is being sought. Thus the only concerned is with a part of the clustering process. Our early algorithms begin to misbehave in the range 5 < or = t < or = 7. Our successive steps of improvement work at smaller and smaller t. The last version we have tried still performs usefully (median misclassification about 16%) at t = 2.7, where knowledge of three populations would only let us hold misclassification to a median of 13.3%. Comparison with a Gaussian maximum likelihood algorithm on the same set of triple samples shows only slightly better performance than for our algorithm.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 1988
Accession Number
ADA191446

Entities

People

  • John W. Tukey
  • Katherine M. Hansen

Organizations

  • Princeton University

Tags

Communities of Interest

  • C4I
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Center Of Gravity
  • Classification
  • Clustering
  • Computer Programs
  • Data Analysis
  • Data Science
  • Data Sets
  • Information Science
  • Military Research
  • Operating Systems
  • Security
  • Shape
  • Statistics
  • Test Beds
  • Triangles
  • Two Dimensional

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Computer Vision.
  • Graph Algorithms and Convex Optimization.