Subdue: Compression-Based Frequent Pattern Discovery in Graph Data

Abstract

A majority of the existing algorithms which mine graph datasets target complete, frequent sub-graph discovery. We describe the graph-based data mining system Subdue which focuses on the discovery of sub-graphs which are not only frequent but also compress the graph dataset, using a heuristic algorithm. The rationale behind the use of a compression-based methodology for frequent pattern discovery is to produce a fewer number of highly interesting patterns than to generate a large number of patterns from which interesting patterns need to be identified. We perform an experimental comparison of Subdue with the graph mining systems gSpan and FSG on the Chemical Toxicity and the Chemical Compounds datasets that are provided with gSpan. We present results on the performance on the Subdue system on the Mutagenesis and the KDD 2003 Citation Graph dataset. An analysis of the results indicates that Subdue can efficiently discover best-compressing frequent patterns which are fewer in number but can be of higher interest.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2005
Accession Number
ADA459053

Entities

People

  • Diane Cook
  • Lawrence B. Holder
  • Nikhil S. Ketkar

Organizations

  • University of Texas at Arlington

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Abstracts
  • Air Force
  • Algorithms
  • Chemical Compounds
  • Compression
  • Computer Programs
  • Computer Science
  • Data Mining
  • Databases
  • Distortion
  • Graph Theory
  • Kernel Functions
  • Link Analysis
  • Molecular Structure
  • Toxicity
  • United States Government
  • Universities

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Graph Algorithms and Convex Optimization.
  • Molecular and Cellular Biochemistry

Technology Areas

  • AI & ML