Multiprocessor Performance Debugging and Memory Bottlenecks
Abstract
Driven by the computational demands of scientists and engineers, computer architects are building increasingly complex multiprocessor systems. However, while the peak Gigaflop ratings of such systems is often impressive, the actual performance of initial implementations of applications can be disappointing. To make the task of performance debugging manageable, tools are needed that can analyze program behavior and report sources of performance loss. This thesis offers techniques for building such tools for shared memory multiprocessors. Previous efforts to build performance debugging systems for shared memory multiprocessors had two shortcomings. First, though memory hierarchy performance is often critical to whole program performance, most tools cannot distinguish time the CPU is computing from time when it is stalled waiting on the memory hierarchy. Second, other tools often significantly perturb a program's execution. This dissertation addresses both of these problems. I describe software instrumentation that typically increases program execution time by less than 10%, while collecting a detailed profile of where processors are doing work, waiting for work, or stalled waiting on the memory hierarchy. A window-based user interface allows the user to interpret the profile, viewing compute, memory, and synchronization bottlenecks at increasing levels of detail, from a whole program level down to the level of individual procedures, loops, and synchronization objects. Several multiprocessor case studies are included to illustrate the features of the tool.
Document Details
- Document Type
- Technical Report
- Publication Date
- May 01, 1992
- Accession Number
- ADA268387
Entities
People
- Aaron J. Goldberg
Organizations
- Stanford University