The Analysis of a Simple k-Means Clustering Algorithm

Abstract

K-means clustering is a very popular clustering technique which is used in numerous applications. Given a set of n data points in R(exp d) and an integer k, the problem is to determine a set of k points R(exp d), called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is very easy to implement. It differs from most other approaches in that it precomputes a kd-tree data structure for the data points rather than the center points. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time. Second, we have implemented the algorithm and performed a number of empirical studies, both on synthetically generated data and on real data from applications in color quantization, compression, and segmentation.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2000
Accession Number
ADA458738

Entities

People

  • A. Y. Wu
  • C. Piatko
  • D. M. Mount
  • N. S. Netanyahu
  • R. Silverman
  • T. Kanungo

Organizations

  • University of Maryland

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Clustering
  • District Of Columbia
  • Filters
  • Information Operations
  • Language
  • Mathematics
  • Physics Laboratories
  • Standards
  • Three Dimensional
  • Trees (Data Structures)
  • Universities

Fields of Study

  • Computer science

Readers

  • Computer Vision.
  • Neural Network Machine Learning.