Exact pattern matching with feed-forward bloom filters

Abstract

This article presents a new, memory efficient and cache-optimized algorithm for simultaneously searching for a large number of patterns in a very large corpus. This algorithm builds upon the Rabin-Karp string search algorithm and incorporates a new type of Bloom filter that we call a feed-forward Bloom filter . While it retains the asymptotic time complexity of previous multiple pattern matching algorithms, we show that this technique, along with a CPU architecture-aware design of the Bloom filter, can provide speed-ups between 2× and 30×, and memory consumption reductions as large as 50× when compared with grep. Our algorithm is also well suited for implementations on GPUs: A modern GPU can search for 3 million patterns at a rate of 580MB/s, and for 100 million patterns (a prohibitive number for traditional algorithms) at a rate of 170MB/s.

Document Details

Document Type
Pub Defense Publication
Publication Date
Jul 01, 2012
Source ID
10.1145/2133803.2330085

Entities

People

  • David G. Andersen
  • Iulian Moraru

Organizations

  • Army Research Office
  • Carnegie Mellon University
  • Division of Computing and Communication Foundations

Tags

Fields of Study

  • Computer science

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Parallel and Distributed Computing.