Clustering of Large Event Sets Using Delaunay Tessellations and K-Medoid Optimization

Abstract

This paper introduces a new fully-automated method for identifying clusters in large event sets. The method uses Delaunay tessellation to determine the clusters and initial membership, then applies an iterative K-method optimization to refine membership in the clusters until stability is achieved. The method is robust and computationally efficient, with performance improvement over standard K-method optimization from O(n2) to O(n log n), which is achieved by making use of the Delaunay tessellation neighbor connectivity information. It produces clusters that meet three key criteria: 1) for each cluster, each event is closer to the representative event for that cluster (the medoid) than to the representative events for nearby clusters, 2) for each member event in a cluster, there is a closest neighbor that is no further away the D max, 3) no event in a cluster is further than D max from any other event in the cluster. D max is a user-defined parameter that can be used to control the number and size of clusters. The basic algorithm consists of three steps. First, the initial clusters are identified by forming a Delaunay tessellation for the entire set, then removing all edges longer than D max. Second, the initial clusters are sub-divided using a medial-axis subdivision algorithm until no cluster has a maximum event-to-event span greater that D max. Third, given these groups, membership in the groups and K-medoid representatives for each are optimized in a hill-climbing iterative process. In most cases, this sequence produces excellent results, but we have found rare cases where the method can form poor clusters or event-to-cluster assignments. Hence we have added an additional clean up step that can break up clusters with a main body of members and a few outliers to merge the main body with a nearby cluster, and that can re-assign an outlier member to another cluster if that cluster has nearby events to the outlier.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2007
Accession Number
ADA519120

Entities

People

  • Christopher J. Young
  • James R. Hipp

Organizations

  • Sandia National Laboratories

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Air Force
  • Air Force Research Laboratories
  • Algorithms
  • Clustering
  • Data Mining
  • Data Sets
  • Explosions
  • Ground Based
  • Instructions
  • Military Research
  • Monitoring
  • Nuclear Explosions
  • Operations Research
  • Optimization
  • Sequences
  • Standards

Fields of Study

  • Computer science

Readers

  • Computational Fluid Dynamics (CFD)
  • Quantum Chemistry
  • Regression Analysis.