Integration of Clustering with Semantics Learning for Massive Categorical and Mixed Data

Abstract

The PI has had very good performance with this grant. The main objective of this research was to conduct a systematic study of data-driven similarity measures based on information theory and kernel-based methods for representation of cluster centers for categorical objects so as to ultimately develop a k-means like clustering methodology capable of handling missing data for categorical and mixed datasets. Firstly, the PI has proposed a new unsupervised similarity measure for categorical data based on the information theoretic approach. Secondly, based on the newly developed similarity measure for categorical data, they have proposed a novel k-means like clustering framework making use of kernel-based methods for representation of cluster centers. Thirdly, they also developed the so-called kCCM algorithm for clustering categorical data with missing values. Finally, they have further extended the proposed k -means like clustering framework so as to make it applicable for clustering mixed numeric and categorical datasets with missing data. The PI has had 3 journal papers and 7 conference/workshops as a direct result of this research grant. There was one graduate student supported by this research grant.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Nov 18, 2019
Accession Number: AD1096534

Entities

People

Van N. Huynh

Organizations

Japan Advanced Institute of Science and Technology

Integration of Clustering with Semantics Learning for Massive Categorical and Mixed Data

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers