Hybrid QR Factorization Algorithm for High Performance Computing Architectures
Abstract
This paper describes a novel QR decomposition algorithm that utilizes both Givens rotations and Householder reflections to give optimal performance on high performance computing architectures. Computing the matrix factorization A = QR is a mathematical step frequently encountered in many signal processing applications including adaptive nulling. Efficient algorithms for computing the QR factorization are vital to satisfying strict latency and throughput constraints in real-time implementations of these signal processing algorithms. By effectively combining the use of Givens rotations and Householder reflections on a parallel computing platform it is possible to efficiently map the computational algorithm to the available computational hardware to best make use of the memory hierarchy and to optimize performance. Furthermore by introducing adjustable software parameters that control load partitioning among the processors cache use and the number of computations done in serial fashion on one processor versus the calculations done concurrently on several processors it is possible for the user to tune the algorithm to run on different machines, without rewriting any code and still maintaining optimum performance. Results given in this abstract show that properly tuning the Hybrid QR algorithm on the SGI O3000 provided better performance than the available SCALAPACK library routine PSGEQRF for computing the QR decomposition.
Document Details
- Document Type
- Technical Report
- Publication Date
- May 01, 2003
- Accession Number
- ADA419390
Entities
People
- Gerard G. L. Meyer
- Peter Vouras
Organizations
- United States Naval Research Laboratory