Spectral Clustering with Links and Attributes

Abstract

If relational data contain communities-groups of inter-related items with similar attribute values-a clustering technique that considers attribute information and the structure of relations simultaneously should produce more meaningful clusters than those produced by considering attributes alone. We investigate this hypothesis in the context of a spectral graph partitioning technique, considering a number of hybrid similarity metrics that combine both sources of information. Through simulation, we find that two of the hybrid metrics achieve superior performance over a wide range of data characteristics. We analyze the spectral decomposition algorithm from a statistical perspective and show that the successful hybrid metrics exaggerate the separation between cluster similarity values, at the expense of increased variance. We cluster several relational datasets using the best hybrid metric and show that the resulting clusters exhibit significant community structure, and that they significantly improve performance in a related classification task.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2004
Accession Number
ADA472209

Entities

People

  • David Jensen
  • Jennifer Neville
  • Micah Adler

Organizations

  • University of Massachusetts Amherst

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Algorithms
  • Bayesian Networks
  • Classification
  • Clustering
  • Communities
  • Computational Science
  • Data Sets
  • Databases
  • Information Science
  • Learning
  • Machine Learning
  • Neural Networks
  • Probabilistic Models
  • Probability
  • Probability Distributions
  • Random Variables
  • Urban Areas

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Operations Research
  • Psychometric Testing or Psychological Assessment.