Multilevel Refinement for Hierarchical Clustering
Abstract
Hierarchical methods are well-known clustering techniques that can be potentially very useful for various data mining tasks. A hierarchical clustering scheme produces a sequence of clusterings in which each clustering is nested into the next clustering in the sequence. Since hierarchical clustering is a greedy search algorithm based on a local search, the merging decisions made early in the agglomerative process are not necessarily the right ones. One possible solution to this problem is to refine a clustering produced by the agglomerative hierarchical algorithm to potentially correct the mistakes made early in the agglomerative process. The problem of refining a clustering has many similarities with that of refining a min-cut k-way partitioning of a graph. In this paper, the authors explore multilevel refinement schemes for refining and improving the clusterings produced by hierarchical agglomerative clustering. This algorithm combines traditional hierarchical clustering with multilevel refinement that has been found to be very effective for computing min-cut k-way partitioning of graphs. They consider several clustering objective functions for the proposed refinement step and investigate the usefulness of these objective functions. Their experimental results demonstrate that this algorithm produces clustering solutions that are consistently and significantly better than those produced by hierarchical clustering algorithms alone. Furthermore, their algorithm has the additional advantage of being extremely fast, as it operates on a sparse similarity matrix. The amount of time required by the algorithm ranged from 2 seconds for a data set with 358 items, to 80 seconds for a data set with 9,133 items on a Pentium II PC.
Document Details
- Document Type
- Technical Report
- Publication Date
- May 17, 1999
- Accession Number
- ADA439812
Entities
People
- Euihong Han
- George Karypis
- Vipin Kumar
Organizations
- University of Minnesota