Groute
Abstract
Nodes with multiple GPUs are becoming the platform of choice for high-performance computing. However, most applications are written using bulk-synchronous programming models, which may not be optimal for irregular algorithms that benefit from low-latency, asynchronous communication. This paper proposes constructs for asynchronous multi-GPU programming, and describes their implementation in a thin runtime environment called Groute. Groute also implements common collective operations and distributed work-lists, enabling the development of irregular applications without substantial programming effort. We demonstrate that this approach achieves state-of-the-art performance and exhibits strong scaling for a suite of irregular applications on 8-GPU and heterogeneous systems, yielding over 7x speedup for some algorithms.
Document Details
- Document Type
- Pub Defense Publication
- Publication Date
- Jan 26, 2017
- Source ID
- 10.1145/3155284.3018756
Entities
People
- Keshav Pingali
- Michael A. Sutton
- Sreepathi Pai
- Tal Ben-nun
Organizations
- Defense Advanced Research Projects Agency
- German Research Foundation
- Hebrew University of Jerusalem
- National Science Foundation
- Nvidia
- University of Texas at Austin