Efficient algorithms for computing rank‐revealing factorizations on a GPU

Abstract

Standard rank‐revealing factorizations such as the singular value decomposition (SVD) and column pivoted QR factorization are challenging to implement efficiently on a GPU. A major difficulty in this regard is the inability of standard algorithms to cast most operations in terms of the Level‐3 BLAS. This article presents two alternative algorithms for computing a rank‐revealing factorization of the form , where and are orthogonal and is trapezoidal (or triangular if is square). Both algorithms use randomized projection techniques to cast most of the flops in terms of matrix‐matrix multiplication, which is exceptionally efficient on the GPU. Numerical experiments illustrate that these algorithms achieve significant acceleration over finely tuned GPU implementations of the SVD while providing low rank approximation errors close to that of the SVD.

Document Details

Document Type
Pub Defense Publication
Publication Date
Jun 02, 2023
Source ID
10.1002/nla.2515

Entities

People

  • Abinand Gopal
  • Chao
  • Nathan Heavner
  • Per‐gunnar Martinsson

Organizations

  • National Science Foundation of Sri Lanka
  • Office of Naval Research
  • University of Colorado Boulder
  • University of Texas at Austin
  • Yale University

Tags

Readers

  • Linear Algebra
  • Parallel and Distributed Computing.
  • Systems Analysis and Design