Improving Index Performance through Prefetching

Abstract

In recognition of the crucial role that cache hierarchies play in database performance, recent studies have revisited core database algorithms and data structures in an effort to reduce the number of cache misses. While these efforts to avoid cache misses are certainly helpful they are not a complete solution for two reasons. First, a large number of cache misses still remain that cannot be eliminated. Second, because modern processors support prefetching and other mechanisms to potentially overlap cache misses with computation and other misses, it is not the total number of cache misses that dictates performance, but rather the total amount of exposed miss latency. Hence an algorithm that is more amenable to prefetching can potentially out perform an algorithm with fewer cache misses. In this paper, we propose and evaluate Prefetching B+ Trees (pB+ Trees). Such trees are designed to exploit prefetching to accelerate two important operations on B+ Tree indices: searches and range scans. To accelerate searches, pB+ Trees use prefetching to effectively create wider nodes than the natural data transfer size: e.g., eight vs. one cache lines or disk pages. These wider nodes reduce the height of the B+ Tree, thereby decreasing the number of expensive misses when going from parent to child without significantly increasing the cost of fetching a given node. Our results show that this technique speeds up search, insertion, and deletion times by a factor of 1.2-1.5 for main-memory B+ Trees. In addition, it outperforms and is complimentary to Cache-Sensitive B+ Trees. To accelerate range scans, pB+ Trees provide arrays of pointers to their leaf nodes. These allow the pB+ Tree to prefetch arbitrarily far ahead, even for nonclustered indices, thereby hiding the normally expensive cache misses associated with traversing the leaves within the range. Our results show that this technique yields over a sixfold speedup on range scans of over 1000 keys.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 2000
Accession Number
ADA387165

Entities

People

  • Phillip B. Gibbons
  • Shimin Chen
  • Todd C. Mowry

Organizations

  • Carnegie Mellon University

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Application Software
  • Bandwidth
  • Computations
  • Computer Programs
  • Computer Science
  • Computers
  • Data Transmission
  • Database Management Systems
  • Databases
  • Equations
  • Hierarchies
  • Lists (Data Structures)
  • Simulations
  • Simulators
  • Steady State
  • Trees (Data Structures)

Fields of Study

  • Computer science

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Parallel and Distributed Computing.