DClusterE
Abstract
Over the last decade, document clustering, as one of the key tasks in information organization and navigation, has been widely studied. Many algorithms have been developed for addressing various challenges in document clustering and for improving clustering performance. However, relatively few research efforts have been reported on evaluating and understanding document clustering results. In this article, we present DClusterE , a comprehensive and effective framework for document clustering evaluation and understanding using information visualization. DClusterE integrates cluster validation with user interactions and offers rich visualization tools for users to examine document clustering results from multiple perspectives. In particular, through informative views including force-directed layout view, matrix view, and cluster view, DClusterE provides not only different aspects of document inter/intra-clustering structures, but also the corresponding relationship between clustering results and the ground truth. Additionally, DClusterE supports general user interactions such as zoom in/out, browsing, and interactive access of the documents at different levels. Two new techniques are proposed to implement DClusterE : (1) A novel multiplicative update algorithm (MUA) for matrix reordering to generate narrow-banded (or clustered) nonzero patterns from documents. Combined with coarse seriation, MUA is able to provide better visualization of the cluster structures. (2) A Mallows-distance-based algorithm for establishing the relationship between the clustering results and the ground truth, which serves as the basis for coloring schemes. Experiments and user studies are conducted to demonstrate the effectiveness and efficiency of DClusterE .
Document Details
- Document Type
- Pub Defense Publication
- Publication Date
- Feb 01, 2012
- Source ID
- 10.1145/2089094.2089100
Entities
People
- Tao Li
- Yi Zhang
Organizations
- Army Research Office
- Division of Biological Infrastructure
- Florida International University
- National Science Foundation Division of Mathematical Sciences
- United States Department of Homeland Security