Soft Clustering Criterion Functions for Partitional Document Clustering

Abstract

Recently published studies have shown that partitional clustering algorithms that optimize certain criterion functions, which measure key aspects of inter- and intra-cluster similarity, are very effective in producing hard clustering solutions for document datasets and outperform traditional partitional and agglomerative algorithms. In this paper we study the extent to which these criterion functions can be modified to include soft membership functions and whether or not the resulting soft clustering algorithms can further improve the clustering solutions. Specifically, we focus on four of these hard criterion functions, derive their soft-clustering extensions, present a comprehensive experimental evaluation involving twelve different datasets, and analyze their overall characteristics. Our results show that introducing softness into the criterion functions tends to lead to better clustering results for most datasets and consistently improve the separation between the clusters.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 26, 2004
Accession Number
ADA439425

Entities

People

  • George Karypis
  • Ying Zhao

Organizations

  • University of Minnesota

Tags

Communities of Interest

  • Biomedical

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Clustering
  • Computer Science
  • Computers
  • Data Sets
  • Engineering
  • Information Operations
  • Instructions
  • Iterations
  • Military Research
  • Minnesota
  • Optimization
  • Research Facilities
  • Standards
  • Universities

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Systems Analysis and Design