Groute
Abstract
Nodes with multiple GPUs are becoming the platform of choice for high-performance computing. However, most applications are written using bulk-synchronous programming models, which may not be optimal for irregular algorithms that benefit from low-latency, asynchronous communication. This article proposes constructs for asynchronous multi-GPU programming and describes their implementation in a thin runtime environment called Groute. Groute also implements common collective operations and distributed work-lists, enabling the development of irregular applications without substantial programming effort. We demonstrate that this approach achieves state-of-the-art performance and exhibits strong scaling for a suite of irregular applications on eight-GPU and heterogeneous systems, yielding over 7× speedup for some algorithms.
Document Details
- Document Type
- Pub Defense Publication
- Publication Date
- Jun 21, 2020
- Source ID
- 10.1145/3399730
Entities
People
- Keshav Pingali
- Michael A. Sutton
- Sreepathi Pai
- Tal Ben-nun
Organizations
- Defense Advanced Research Projects Agency
- ETH Zurich
- German Research Foundation
- Hebrew University of Jerusalem
- National Science Foundation
- Swiss National Science Foundation
- University of Rochester
- University of Texas at Austin