Cache-Oblivious Algorithms

Abstract

This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious : no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size M and cache-line length B where M = Ω ( B 2 ), the number of cache misses for an m × n matrix transpose is Θ (1 + mn / B ). The number of cache misses for either an n -point FFT or the sorting of n numbers is Θ (1 + ( n / B )(1 + log M n )). We also give a Θ ( mnp )-work algorithm to multiply an m × n matrix by an n × p matrix that incurs Θ (1 + ( mn + np + mp )/ B + mnp / B √ M ) cache faults.

Document Details

Document Type
Pub Defense Publication
Publication Date
Jan 01, 2012
Source ID
10.1145/2071379.2071383

Entities

People

  • Charles E. Leiserson
  • Harald Prokop
  • Matteo Frigo
  • Sridhar Ramachandran

Organizations

  • Defense Advanced Research Projects Agency
  • Division of Computer and Network Systems
  • Division of Computing and Communication Foundations
  • MIT Computer Science and Artificial Intelligence Laboratory
  • National Science Foundation

Tags

Fields of Study

  • Computer science

Readers

  • Graph Algorithms and Convex Optimization.
  • Parallel and Distributed Computing.