Online Clustering with Bayesian Nonparametrics; OR A Bayesian Model-Based Method for Clustering Dynamic Datasets; OR Model-Based Clustering for Dynamic Data
Abstract
Clustering algorithms, such as Gaussian mixture models and K-means, often require the number of clusters to be specified a priori. Bayesiannonparametric (BNP) methods avoid this problem by specifying a prior distribution over the cluster assignments that al-lows the number of clusters to be inferred from the data. This can be especially useful for online clustering tasks, where data arrives in a continuous stream and the number of clusters may dynamically change over time. Classical BNP priors often overestimate the number of clusters, however, leading researchers to develop new priors with more control over this tendency. To date, BNP algorithms resistant to over-clustering have only been implemented for offline processing, utilizing Markov chain Monte Carlo inference. In this dissertation, we derive a novel algorithm for online BNP clustering using variational inference, with explicit control over the over-clustering phenomenon. Additionally, we propose two methods for tuning a critical hyperparameter mid-stream, based on empirical analysis of the BNP cluster assignment prior and a cost function from Gaussian mixture reduction. We demonstrate the effectiveness of our algorithms on dynamic datasets designed specifically to challenge online BNP clustering algorithms. We also show that our algorithms can be employed for practical applications of radar pulse clustering and neural spike sorting, achieving competitive and often superiorresults when compared to classical BNP methods. Furthermore, we exploit the model-based framework to extend our algorithm and tuning methods from purely Gaussian mixtures to handle data with mixed multivariate Gaussian and categorical type, and demonstrate this new extension on real-world data. Our empirical studies indicate that the developments in this dissertation are a significant contribution to the state of the art in BNP clustering.
Document Details
- Document Type
- Technical Report
- Publication Date
- Oct 06, 2020
- Accession Number
- AD1110841
Entities
People
- Matthew Scherreik
Organizations
- Wright State University