Integration of Clustering with Semantics Learning for Massive Categorical and Mixed Data

Abstract

The PI has had very good performance with this grant. The main objective of this research was to conduct a systematic study of data-driven similarity measures based on information theory and kernel-based methods for representation of cluster centers for categorical objects so as to ultimately develop a k-means like clustering methodology capable of handling missing data for categorical and mixed datasets. Firstly, the PI has proposed a new unsupervised similarity measure for categorical data based on the information theoretic approach. Secondly, based on the newly developed similarity measure for categorical data, they have proposed a novel k-means like clustering framework making use of kernel-based methods for representation of cluster centers. Thirdly, they also developed the so-called kCCM algorithm for clustering categorical data with missing values. Finally, they have further extended the proposed k -means like clustering framework so as to make it applicable for clustering mixed numeric and categorical datasets with missing data. The PI has had 3 journal papers and 7 conference/workshops as a direct result of this research grant. There was one graduate student supported by this research grant.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 18, 2019
Accession Number
AD1096534

Entities

People

  • Van N. Huynh

Organizations

  • Japan Advanced Institute of Science and Technology

Tags

Communities of Interest

  • Autonomy
  • Biomedical
  • Human Systems

DTIC Thesaurus Topics

  • Air Force
  • Air Force Research Laboratories
  • Algorithms
  • Artificial Intelligence
  • Big Data
  • Computer Science
  • Data Mining
  • Data Sets
  • Earth Sciences
  • Electronic Mail
  • Information Science
  • Information Theory
  • Laboratory Procedures
  • Machine Learning
  • Social Sciences
  • Systems Science
  • Unsupervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Research Science/Academic Research