The effectiveness of lloyd-type methods for the k-means problem

Abstract

We investigate variants of Lloyd's heuristic for clustering high-dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and in order to suggest improvements in its application. We propose and justify a clusterability criterion for data sets. We present variants of Lloyd's heuristic that quickly lead to provably near-optimal clustering solutions when applied to well-clusterable instances. This is the first performance guarantee for a variant of Lloyd's heuristic. The provision of a guarantee on output quality does not come at the expense of speed: some of our algorithms are candidates for being faster in practice than currently used variants of Lloyd's method. In addition, our other algorithms are faster on well-clusterable instances than recently proposed approximation algorithms, while maintaining similar guarantees on clustering quality. Our main algorithmic contribution is a novel probabilistic seeding process for the starting configuration of a Lloyd-type iteration.

Document Details

Document Type: Pub Defense Publication
Publication Date: Dec 01, 2012
Source ID: 10.1145/2395116.2395117

Entities

People

Chaitanya Swamy
Leonard J. Schulman
Rafail Ostrovsky
Yuval Rabani

Organizations

Bulgarian Science Fund
California Institute of Technology
Hebrew University of Jerusalem
Israel Science Foundation
National Science Foundation
Natural Sciences and Engineering Research Council
Office of Naval Research
University of California, Los Angeles
University of Waterloo

The effectiveness of lloyd-type methods for the k-means problem

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers