Software/Hardware Co-Design to Improve Productivity, Portability, and Performance of Loop-Task Parallel Applications

Abstract

Computer architects are increasingly turning to programmable accelerators tailored for narrower classes of applications in order to achieve high performance and energy efficiency. A continuing challenge with accelerators is enabling the programmer to easily extract maximum performance without intimate knowledge of the underlying microarchitecture. It is important to consider productivity and portability, in addition to performance, as first-class metrics when developing and evaluating modern computing platforms. Software-centric approaches to achieving 3P computing platforms are compelling, but sacrifice efficiency and flexibility by hiding parallel abstractions from hardware and limiting the scope of the application domain. This thesis proposes a new software/hardware co-design approach to achieving 3P platforms, called the loop-task accelerator (LTA) platform, that provides high productivity and portability without sacrificing performance or efficiency across a wide range of applications. The LTA platform addresses the weaknesses of existing approaches that are identified through detailed experimentation with and analysis of modern application development. Discussion of an early attempt at a hardware-centric approach to achieving 3P platforms provides insight into area-efficient accelerator designs and highlights the need for innovations in both software and hardware. The LTA platform focuses on exploiting loop-task parallelism by exposing loop-tasks as a common parallel abstraction at the programming API, runtime, ISA, and microarchitectural levels. The LTA programming API uses the parallel_for construct to express loop-tasks that can be exploited both across cores and within a core, the LTA runtime distributes loop-tasks across cores, and a new xpfor instruction explicitly encodes loop-tasks as functions applied to a range of loop iterations. This thesis introduces a novel task-coupling taxonomy that captures how tasks can be coupled in both space and time.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2017
Accession Number
AD1143315

Entities

People

  • Ji Y. Kim

Organizations

  • Cornell University

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Application-Specific Integrated Circuits
  • Compilers
  • Computer Architecture
  • Computer Programming
  • Computer-Aided Design
  • Computers
  • Electrical Engineering
  • Energy Efficiency
  • Graphics Processing Unit
  • High Performance Computing
  • Information Science
  • Instruction Set Architecture
  • Integrated Circuits
  • Microarchitecture
  • Molecular Dynamics
  • Neural Networks
  • Object Oriented Programming
  • Operating Systems
  • Parallel Computing
  • Programming Languages
  • Very Large Scale Integration

Fields of Study

  • Computer science
  • Engineering

Readers

  • Distributed Systems and Data Platform Development
  • Parallel and Distributed Computing.
  • Software Engineering.

Technology Areas

  • Space