Clustering, factor discovery and optimal transport

Abstract

The clustering problem, and more generally latent factor discovery or latent space inference, is formulated in terms of the Wasserstein barycenter problem from optimal transport. The objective proposed is the maximization of the variability attributable to class, further characterized as the minimization of the variance of the Wasserstein barycenter. Existing theory, which constrains the transport maps to rigid translations, is extended to affine transformations. The resulting non-parametric clustering algorithms include $k$-means as a special case and exhibit more robust performance. A continuous version of these algorithms discovers continuous latent variables and generalizes principal curves. The strength of these algorithms is demonstrated by tests on both artificial and real-world data sets.

Document Details

Document Type
Pub Defense Publication
Publication Date
Dec 26, 2020
Source ID
10.1093/imaiai/iaaa040

Entities

People

  • Esteban G. Tabak
  • Hongkang Yang

Organizations

  • Courant Institute of Mathematical Sciences, NYU
  • National Science Foundation
  • Office of Naval Research

Tags

Fields of Study

  • Computer science

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Fluid Dynamics.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms
  • Space