Groute

Abstract

Nodes with multiple GPUs are becoming the platform of choice for high-performance computing. However, most applications are written using bulk-synchronous programming models, which may not be optimal for irregular algorithms that benefit from low-latency, asynchronous communication. This article proposes constructs for asynchronous multi-GPU programming and describes their implementation in a thin runtime environment called Groute. Groute also implements common collective operations and distributed work-lists, enabling the development of irregular applications without substantial programming effort. We demonstrate that this approach achieves state-of-the-art performance and exhibits strong scaling for a suite of irregular applications on eight-GPU and heterogeneous systems, yielding over 7× speedup for some algorithms.

Document Details

Document Type
Pub Defense Publication
Publication Date
Jun 21, 2020
Source ID
10.1145/3399730

Entities

People

  • Keshav Pingali
  • Michael A. Sutton
  • Sreepathi Pai
  • Tal Ben-nun

Organizations

  • Defense Advanced Research Projects Agency
  • ETH Zurich
  • German Research Foundation
  • Hebrew University of Jerusalem
  • National Science Foundation
  • Swiss National Science Foundation
  • University of Rochester
  • University of Texas at Austin

Tags

Fields of Study

  • Computer science

Readers

  • Parallel and Distributed Computing.