Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages
Abstract
As the complexity of machines and architectures has increased, performance tuning has become more challenging, leading to the failure of general compilers to generate the best possible optimized code. Expert performance programmers can often hand-write code that outperforms compiler-optimized low-level code by an order of magnitude. At the same time, the complexity of programs has also increased, with modern programs built on a variety of abstraction layers to manage complexity, yet these layers hinder efforts at optimization. In fact, it is common to lose one or two additional orders of magnitude in performance when going from a low-level language such as Fortran or C to a high-level language like Python, Ruby, or Matlab.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 02, 2013
- Accession Number
- ADA575484
Entities
People
- Shoaib A. Kamil
Organizations
- University of California, Berkeley