Integration of Clustering with Semantics Learning for Massive Categorical and Mixed Data
Abstract
The PI has had very good performance with this grant. The main objective of this research was to conduct a systematic study of data-driven similarity measures based on information theory and kernel-based methods for representation of cluster centers for categorical objects so as to ultimately develop a k-means like clustering methodology capable of handling missing data for categorical and mixed datasets. Firstly, the PI has proposed a new unsupervised similarity measure for categorical data based on the information theoretic approach. Secondly, based on the newly developed similarity measure for categorical data, they have proposed a novel k-means like clustering framework making use of kernel-based methods for representation of cluster centers. Thirdly, they also developed the so-called kCCM algorithm for clustering categorical data with missing values. Finally, they have further extended the proposed k -means like clustering framework so as to make it applicable for clustering mixed numeric and categorical datasets with missing data. The PI has had 3 journal papers and 7 conference/workshops as a direct result of this research grant. There was one graduate student supported by this research grant.
Document Details
- Document Type
- Technical Report
- Publication Date
- Nov 18, 2019
- Accession Number
- AD1096534
Entities
People
- Van N. Huynh
Organizations
- Japan Advanced Institute of Science and Technology