Clustering of Large Event Sets Using Delaunay Tessellations and K-Medoid Optimization
Abstract
This paper introduces a new fully-automated method for identifying clusters in large event sets. The method uses Delaunay tessellation to determine the clusters and initial membership, then applies an iterative K-method optimization to refine membership in the clusters until stability is achieved. The method is robust and computationally efficient, with performance improvement over standard K-method optimization from O(n2) to O(n log n), which is achieved by making use of the Delaunay tessellation neighbor connectivity information. It produces clusters that meet three key criteria: 1) for each cluster, each event is closer to the representative event for that cluster (the medoid) than to the representative events for nearby clusters, 2) for each member event in a cluster, there is a closest neighbor that is no further away the D max, 3) no event in a cluster is further than D max from any other event in the cluster. D max is a user-defined parameter that can be used to control the number and size of clusters. The basic algorithm consists of three steps. First, the initial clusters are identified by forming a Delaunay tessellation for the entire set, then removing all edges longer than D max. Second, the initial clusters are sub-divided using a medial-axis subdivision algorithm until no cluster has a maximum event-to-event span greater that D max. Third, given these groups, membership in the groups and K-medoid representatives for each are optimized in a hill-climbing iterative process. In most cases, this sequence produces excellent results, but we have found rare cases where the method can form poor clusters or event-to-cluster assignments. Hence we have added an additional clean up step that can break up clusters with a main body of members and a few outliers to merge the main body with a nearby cluster, and that can re-assign an outlier member to another cluster if that cluster has nearby events to the outlier.
Document Details
- Document Type
- Technical Report
- Publication Date
- Sep 01, 2007
- Accession Number
- ADA519120
Entities
People
- Christopher J. Young
- James R. Hipp
Organizations
- Sandia National Laboratories