Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds

Abstract

In this paper we study the problem of classifying chemical compound datasets. We present a sub-structure-based classification algorithm that decouples the sub-structure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric sub-structures present in the dataset. The advantage of our approach is that during classification model construction, all relevant sub-structures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Our experimental evaluation on eight different classification problems shows that our approach is computationally scalable and outperforms existing schemes by 10% to 35%, on the average.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 10, 2003
Accession Number
ADA439580

Entities

People

  • George Karypis
  • Michihiro Kuramochi
  • Mukund Deshpande

Organizations

  • University of Minnesota

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Bioassay
  • Chemical Compounds
  • Classification
  • Computational Complexity
  • Computer Science
  • Computers
  • Dipole Moments
  • Engineering
  • Escherichia Coli
  • Feature Selection
  • Frequency
  • High Performance Computing
  • Information Operations
  • Minnesota
  • Test Sets

Fields of Study

  • Computer science

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Distributed Systems and Data Platform Development
  • Psychometric Testing or Psychological Assessment.