BAMBOO: Accelerating Closed Itemset Mining by Deeply Pushing the Length-Decreasing Support Constraint

Abstract

Previous study has shown that mining frequent patterns with length-decreasing support constraint is very helpful in removing some uninteresting patterns based on the observation that short patterns will tend to be interesting if they have a high support, whereas long patterns can still be very interesting even if their support is relatively low. However, a large number of non-closed (i.e., redundant) patterns can still not be filtered out by simply applying the length decreasing support constraint. As a result, a more desirable pattern discovery task could be mining closed patterns under the length-decreasing support constraint. In this paper we study how to push deeply the length decreasing support constraint into closed itemset mining, which is a particularly challenging problem due to the fact that the downward-closure property cannot be used to prune the search space. Therefore, we have proposed several pruning methods and optimization techniques to enhance the closed itemset mining algorithm, and developed an efficient algorithm, BAMBOO. Extensive performance study based on various length-decreasing support constraints and datasets with different characteristics has shown that BAMBOO not only generates more concise result set, but also runs orders of magnitude faster than several efficent pattern discovery algorithms, including CLOSET+, CFPtree and LPMiner. In addition, BAMBOO also shows very good scalability in terms of the database size.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 29, 2003
Accession Number
ADA438929

Entities

People

  • George Karypis
  • Jianyong Wang

Organizations

  • University of Minnesota

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Compression Ratio
  • Computer Science
  • Computers
  • Data Compression
  • Data Mining
  • Databases
  • Engineering
  • Fungi
  • High Performance Computing
  • Information Operations
  • Military Research
  • Minnesota
  • Optimization
  • Performance Tests
  • Procedures (Computers)
  • Scalability

Fields of Study

  • Computer science

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Distributed Systems and Data Platform Development

Technology Areas

  • Space