Spectral Clustering with Links and Attributes
Abstract
If relational data contain communities-groups of inter-related items with similar attribute values-a clustering technique that considers attribute information and the structure of relations simultaneously should produce more meaningful clusters than those produced by considering attributes alone. We investigate this hypothesis in the context of a spectral graph partitioning technique, considering a number of hybrid similarity metrics that combine both sources of information. Through simulation, we find that two of the hybrid metrics achieve superior performance over a wide range of data characteristics. We analyze the spectral decomposition algorithm from a statistical perspective and show that the successful hybrid metrics exaggerate the separation between cluster similarity values, at the expense of increased variance. We cluster several relational datasets using the best hybrid metric and show that the resulting clusters exhibit significant community structure, and that they significantly improve performance in a related classification task.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 2004
- Accession Number
- ADA472209
Entities
People
- David Jensen
- Jennifer Neville
- Micah Adler
Organizations
- University of Massachusetts Amherst