Online Clustering with Bayesian Nonparametrics; OR A Bayesian Model-Based Method for Clustering Dynamic Datasets; OR Model-Based Clustering for Dynamic Data

Abstract

Clustering algorithms, such as Gaussian mixture models and K-means, often require the number of clusters to be specified a priori. Bayesiannonparametric (BNP) methods avoid this problem by specifying a prior distribution over the cluster assignments that al-lows the number of clusters to be inferred from the data. This can be especially useful for online clustering tasks, where data arrives in a continuous stream and the number of clusters may dynamically change over time. Classical BNP priors often overestimate the number of clusters, however, leading researchers to develop new priors with more control over this tendency. To date, BNP algorithms resistant to over-clustering have only been implemented for offline processing, utilizing Markov chain Monte Carlo inference. In this dissertation, we derive a novel algorithm for online BNP clustering using variational inference, with explicit control over the over-clustering phenomenon. Additionally, we propose two methods for tuning a critical hyperparameter mid-stream, based on empirical analysis of the BNP cluster assignment prior and a cost function from Gaussian mixture reduction. We demonstrate the effectiveness of our algorithms on dynamic datasets designed specifically to challenge online BNP clustering algorithms. We also show that our algorithms can be employed for practical applications of radar pulse clustering and neural spike sorting, achieving competitive and often superiorresults when compared to classical BNP methods. Furthermore, we exploit the model-based framework to extend our algorithm and tuning methods from purely Gaussian mixtures to handle data with mixed multivariate Gaussian and categorical type, and demonstrate this new extension on real-world data. Our empirical studies indicate that the developments in this dissertation are a significant contribution to the state of the art in BNP clustering.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 06, 2020
Accession Number
AD1110841

Entities

People

  • Matthew Scherreik

Organizations

  • Wright State University

Tags

Communities of Interest

  • Autonomy
  • Cyber
  • Sensors
  • Space

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Bayesian Networks
  • Change Detection
  • Computational Science
  • Data Mining
  • Data Science
  • Databases
  • Detectors
  • Information Processing
  • Information Science
  • Information Systems
  • Machine Learning
  • Monte Carlo Method
  • Network Science
  • Probabilistic Models
  • Probability
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Nanofabrication and Microfabrication.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms