Algorithm 1022: Efficient Algorithms for Computing a Rank-Revealing UTV Factorization on Parallel Computing Architectures

Abstract

Randomized singular value decomposition (RSVD) is by now a well-established technique for efficiently computing an approximate singular value decomposition of a matrix. Building on the ideas that underpin RSVD, the recently proposed algorithm “randUTV” computes a full factorization of a given matrix that provides low-rank approximations with near-optimal error. Because the bulk of randUTV is cast in terms of communication-efficient operations such as matrix-matrix multiplication and unpivoted QR factorizations, it is faster than competing rank-revealing factorization methods such as column-pivoted QR in most high-performance computational settings. In this article, optimized randUTV implementations are presented for both shared-memory and distributed-memory computing environments. For shared memory, randUTV is redesigned in terms of an algorithm-by-blocks that, together with a runtime task scheduler, eliminates bottlenecks from data synchronization points to achieve acceleration over the standard blocked algorithm based on a purely fork-join approach. The distributed-memory implementation is based on the ScaLAPACK library. The performance of our new codes compares favorably with competing factorizations available on both shared-memory and distributed-memory architectures.

Document Details

Document Type
Pub Defense Publication
Publication Date
May 26, 2022
Source ID
10.1145/3507466

Entities

People

  • Francisco Igual
  • G. Quintana-ortí
  • N. Heavner
  • P. G. Martinsson

Organizations

  • Complutense University of Madrid
  • Jaume I University
  • National Science Foundation
  • Office of Naval Research
  • Texas Advanced Computing Center
  • University of Colorado Boulder
  • University of Texas at Austin

Tags

Fields of Study

  • Computer science

Readers

  • Linear Algebra
  • Parallel and Distributed Computing.