Multilevel Refinement for Hierarchical Clustering

Abstract

Hierarchical methods are well-known clustering techniques that can be potentially very useful for various data mining tasks. A hierarchical clustering scheme produces a sequence of clusterings in which each clustering is nested into the next clustering in the sequence. Since hierarchical clustering is a greedy search algorithm based on a local search, the merging decisions made early in the agglomerative process are not necessarily the right ones. One possible solution to this problem is to refine a clustering produced by the agglomerative hierarchical algorithm to potentially correct the mistakes made early in the agglomerative process. The problem of refining a clustering has many similarities with that of refining a min-cut k-way partitioning of a graph. In this paper, the authors explore multilevel refinement schemes for refining and improving the clusterings produced by hierarchical agglomerative clustering. This algorithm combines traditional hierarchical clustering with multilevel refinement that has been found to be very effective for computing min-cut k-way partitioning of graphs. They consider several clustering objective functions for the proposed refinement step and investigate the usefulness of these objective functions. Their experimental results demonstrate that this algorithm produces clustering solutions that are consistently and significantly better than those produced by hierarchical clustering algorithms alone. Furthermore, their algorithm has the additional advantage of being extremely fast, as it operates on a sparse similarity matrix. The amount of time required by the algorithm ranged from 2 seconds for a data set with 358 items, to 80 seconds for a data set with 9,133 items on a Pentium II PC.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 17, 1999
Accession Number
ADA439812

Entities

People

  • Euihong Han
  • George Karypis
  • Vipin Kumar

Organizations

  • University of Minnesota

Tags

Communities of Interest

  • C4I
  • Ground and Sea Platforms
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Boundaries
  • Clustering
  • Computational Complexity
  • Computer Science
  • Data Mining
  • Data Sets
  • Engineering
  • Equations
  • Information Operations
  • Military Research
  • Refining
  • Sensitivity
  • Sequences
  • Test And Evaluation
  • Two Dimensional

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Operations Research

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms