EmptyHeaded

Abstract

There are two types of high-performance graph processing engines: low- and high-level engines. Low-level engines (Galois, PowerGraph, Snap) provide optimized data structures and computation models but require users to write low-level imperative code, hence ensuring that efficiency is the burden of the user. In high-level engines, users write in query languages like datalog (SociaLite) or SQL (Grail). High-level engines are easier to use but are orders of magnitude slower than the low-level graph engines. We present EmptyHeaded, a high-level engine that supports a rich datalog-like query language and achieves performance comparable to that of low-level engines. At the core of EmptyHeaded’s design is a new class of join algorithms that satisfy strong theoretical guarantees, but have thus far not achieved performance comparable to that of specialized graph processing engines. To achieve high performance, EmptyHeaded introduces a new join engine architecture, including a novel query optimizer and execution engine that leverage single-instruction multiple data (SIMD) parallelism. With this architecture, EmptyHeaded outperforms high-level approaches by up to three orders of magnitude on graph pattern queries, PageRank, and Single-Source Shortest Paths (SSSP) and is an order of magnitude faster than many low-level baselines. We validate that EmptyHeaded competes with the best-of-breed low-level engine (Galois), achieving comparable performance on PageRank and at most 3× worse performance on SSSP. Finally, we show that the EmptyHeaded design can easily be extended to accommodate a standard resource description framework (RDF) workload, the LUBM benchmark. On the LUBM benchmark, we show that EmptyHeaded can compete with and sometimes outperform two high-level, but specialized RDF baselines (TripleBit and RDF-3X), while outperforming MonetDB by up to three orders of magnitude and LogicBlox by up to two orders of magnitude.

Document Details

Document Type
Pub Defense Publication
Publication Date
Oct 27, 2017
Source ID
10.1145/3129246

Entities

People

  • Andres Nötzli
  • Andrew Lamb
  • Christopher R. Aberger
  • Christopher Ré
  • Kunle Olukotun
  • Susan Tu

Organizations

  • Defense Advanced Research Projects Agency
  • Google
  • Gordon and Betty Moore Foundation
  • National Institute of Biomedical Imaging and Bioengineering
  • National Institutes of Health
  • National Science Foundation
  • Office of Naval Research
  • Stanford University
  • Toshiba

Tags

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Graph Algorithms and Convex Optimization.
  • Parallel and Distributed Computing.