Initial Kernel Timing Using a Simple PIM Performance Model
Abstract
This presentation will describe some initial results of paper-and-pencil studies of 4 or 5 application kernels applied to a processor-in-memory (PIM) system roughly similar to the Cascade Lightweight Processor (LWP). The application kernels are: * Linked list traversal * Sun of leaf nodes on a tree * Bitonic sort * Vector sum * Gaussian elimination The intent of this work is to guide and validate work on the Cascade project in the areas of compilers, simulators, and languages. We will first discuss the generic PIM structure. Then, we will explain the concepts needed to program a parallel PIM system (locality, threads, parcels). Next, we will present a simple PIM performance model that will be used in the remainder of the presentation. For each kernel, we will then present a set of codes, including codes for a single PIM node, and codes for multiple PIM nodes that move data to threads and move threads to data. These codes are written at a fairly low level, between assembly and C, but much closer to C than to assembly. For each code, we will present some hand-drafted timing forecasts, based on the simple PIM performance model. Finally, we will conclude by discussing what we have learned from this work, including what programming styles seem to work best, from the point-of-view of both expressiveness and performance.
Document Details
- Document Type
- Technical Report
- Publication Date
- Feb 01, 2005
- Accession Number
- ADA433091
Entities
People
- Daniel S. Katz
- David Callahan
- Gary L. Block
- Jay B. Brockman
- Paul L. Springer
- Thomas Sterling
Organizations
- Jet Propulsion Laboratory