Utilizing Fused Features to Mine Unknown Clusters in Training Data

Abstract

In this paper, a previously introduced data mining technique, utilizing the Mean Field Bayesian Data Reduction Algorithm (BDRA), is extended for use in finding unknown data clusters in a fused multidimensional feature space. In the BDRA the modeling assumption is that the discrete symbol probabilities of each class are a priori uniformly Dirichlet distributed, and where the primary metric for selecting and discretizing all relevant features is an analytic formula for the probability of error conditioned on the training data. In extending the BDRA for this application, notice that its built-in dimensionality reduction aspects are exploited for isolating and automatically sorting out and mining all points contained in each unknown data cluster. In previous work, this approach was shown to have comparable performance to the classier that knows all cluster information when mining a single feature containing multiple unknown clusters. Therefore, the primary contribution of the work presented here is to demonstrate that this approach can be extended to cases where the features are fused and contain more than one dimension. To illustrate performance, results are demonstrated using simulated data containing multiple clusters, and where the fused feature space contains relevant classification information.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 2006
Accession Number
ADA521524

Entities

People

  • Peter Willett
  • Robert S. Lynch Jr.

Organizations

  • Naval Undersea Warfare Center

Tags

Communities of Interest

  • Space

DTIC Thesaurus Topics

  • Algorithms
  • Classification
  • Data Mining
  • Data Reduction
  • Data Sets
  • Dimensionality Reduction
  • Investments
  • Machine Learning
  • Observation
  • Probability
  • Signal Processing
  • Supervised Machine Learning
  • Test Sets
  • Training
  • Two Dimensional
  • Undersea Warfare

Fields of Study

  • Computer science

Readers

  • Parallel and Distributed Computing.
  • Statistical inference.
  • Theoretical Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • Space