Compiler Assisted Recovery For Fault-Tolerant Highly Parallel Multiprocessor Architectures
Abstract
The purpose of this research was to develop and implement compiler assisted strategies for recovery through multiple instruction reexecution (rollback) in highly parallel computer architectures utilizing hierarchical shared memories. The goal was to facilitate very rapid recovery from high rates of transient and intermittent failures in SDI environments. We worked to achieve this goal with minimal impact on system performance and little hardware overhead by exploiting the hardware features already present in recently developed high performance processor architectures. Our objective was to demonstrate that through appropriate compilation techniques these hardware features can be utilized to perform rapid recovery, without significant architecture redesign. Our research effort concentrated on multiprocessor machines with hierarchical memory structures, due to the architectural trend toward hierarchical memory, shared variable, multiprocessor architectures and due to the current lack of understanding as to how rapid recovery can be accomplished in this class of machines.
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 01, 1992
- Accession Number
- ADA256942
Entities
People
- W. Kent Fuchs
- Wen-mei Hwu
Organizations
- University of Illinois Urbana–Champaign