Initial Kernel Timing Using a Simple PIM Performance Model

Abstract

This presentation will describe some initial results of paper-and-pencil studies of 4 or 5 application kernels applied to a processor-in-memory (PIM) system roughly similar to the Cascade Lightweight Processor (LWP). The application kernels are: * Linked list traversal * Sun of leaf nodes on a tree * Bitonic sort * Vector sum * Gaussian elimination The intent of this work is to guide and validate work on the Cascade project in the areas of compilers, simulators, and languages. We will first discuss the generic PIM structure. Then, we will explain the concepts needed to program a parallel PIM system (locality, threads, parcels). Next, we will present a simple PIM performance model that will be used in the remainder of the presentation. For each kernel, we will then present a set of codes, including codes for a single PIM node, and codes for multiple PIM nodes that move data to threads and move threads to data. These codes are written at a fairly low level, between assembly and C, but much closer to C than to assembly. For each code, we will present some hand-drafted timing forecasts, based on the simple PIM performance model. Finally, we will conclude by discussing what we have learned from this work, including what programming styles seem to work best, from the point-of-view of both expressiveness and performance.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Feb 01, 2005
Accession Number: ADA433091

Entities

People

Daniel S. Katz
David Callahan
Gary L. Block
Jay B. Brockman
Paul L. Springer
Thomas Sterling

Organizations

Jet Propulsion Laboratory

Initial Kernel Timing Using a Simple PIM Performance Model

Abstract

Document Details

Entities

People

Organizations

Tags

DTIC Thesaurus Topics

Fields of Study

Readers