Efficient Closed Pattern Mining in the Presence of Tough Block Constraints

Abstract

In recent years, various constrained frequent pattern mining problem formulations and associated algorithms have been developed that enable the user to specify various itemset based constraints that better capture the underlying application requirements and characteristics. In this paper we introduce a new class of block constraints that determine the significance of an itemset pattern by considering the dense block that is formed by the pattern's items and its associated set of transactions. Block constraints provide a natural framework by which a number of important problems can be specified and make it possible to solve numerous problems on binary and real-valued datasets. However, developing computationally efficient algorithms to find these block constraints poses a number of challenges as unlike the different itemset-based constraints studied earlier, these block constraints are tough as they are neither anti-monotone, monotone, nor convertible. To overcome this problem, we introduce a new class of pruning methods that can be used to significantly reduce the overall search space and make it possible to develop computationally efficient block constraint mining algorithms. We present an algorithm called CBMiner that takes advantage of these pruning methods to develop an algorithm for finding the closed itemsets that satisfy the block constraints. Our extensive performance study shows that CBMiner generates more concise result set and can be order(s) of magnitude faster than the traditional frequent closed itemset mining algorithms.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Nov 25, 2003
Accession Number
ADA439408

Entities

People

  • George Karypis
  • Jianyong Wang
  • Krishna Gade

Organizations

  • University of Minnesota

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Clustering
  • Composite Materials
  • Computer Science
  • Data Mining
  • Databases
  • Dimensionality Reduction
  • Engineering
  • Equations
  • Frequency
  • Hash Tables
  • Machine Learning
  • Numbers
  • Parallel Computing
  • Two Dimensional
  • Vector Spaces

Fields of Study

  • Computer science
  • Engineering

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Distributed Systems and Data Platform Development
  • Operations Research

Technology Areas

  • Space