Impact of Regulation on Spectral Clustering

Abstract

The performance of spectral clustering can be considerably improved via regularization, as demonstrated empirically in Amini et al. [2]. Here, we provide an attempt at quantifying this improvement through theoretical analysis. Under the stochastic block model (SBM), and its extensions, previous results on spectral clustering relied on the minimum degree of the graph being sufficiently large for its good performance. By examining the scenario where the regularization parameter is large we show that the minimum degree assumption can potentially be removed. As a special case, for an SBM with two blocks, the results require the maximum degree to be large (grow faster than log n) as opposed to the minimum degree. More importantly, we show the usefulness of regularization in situations where not all nodes belong to well-defined clusters. Our results rely on a `bias-variance'-like trade-off that arises from understanding the concentration of the sample Laplacian and the eigen gap as a function of the regularization parameter. As a byproduct of our bounds, we propose a data-driven technique DKest (standing for estimated Davis- Kahan bounds) for choosing the regularization parameter. This technique is shown to work well through simulations and on a real data set.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 22, 2014
Accession Number
ADA624381

Entities

People

  • Antony Joseph
  • Bin Yu

Organizations

  • University of California, Berkeley

Tags

Communities of Interest

  • C4I

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • California
  • Clustering
  • Communities
  • Data Sets
  • Eigenvalues
  • Eigenvectors
  • Errors
  • Notation
  • Numbers
  • Probability
  • Random Variables
  • Recovery
  • Simulations
  • Statistics
  • Theorems

Fields of Study

  • Computer science

Readers

  • Linear Algebra
  • Regression Analysis.
  • Systems Analysis and Design