Compiler Assisted Recovery For Fault-Tolerant Highly Parallel Multiprocessor Architectures

Abstract

The purpose of this research was to develop and implement compiler assisted strategies for recovery through multiple instruction reexecution (rollback) in highly parallel computer architectures utilizing hierarchical shared memories. The goal was to facilitate very rapid recovery from high rates of transient and intermittent failures in SDI environments. We worked to achieve this goal with minimal impact on system performance and little hardware overhead by exploiting the hardware features already present in recently developed high performance processor architectures. Our objective was to demonstrate that through appropriate compilation techniques these hardware features can be utilized to perform rapid recovery, without significant architecture redesign. Our research effort concentrated on multiprocessor machines with hierarchical memory structures, due to the architectural trend toward hierarchical memory, shared variable, multiprocessor architectures and due to the current lack of understanding as to how rapid recovery can be accomplished in this class of machines.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 01, 1992
Accession Number
ADA256942

Entities

People

  • W. Kent Fuchs
  • Wen-mei Hwu

Organizations

  • University of Illinois Urbana–Champaign

Tags

DTIC Thesaurus Topics

  • Application Software
  • Birds
  • Compilers
  • Computer Architecture
  • Computer Programming
  • Computer Programs
  • Computers
  • Computing System Architectures
  • Fault Tolerance
  • Fault Tolerant Computing
  • High Performance Computing
  • Multiprocessors
  • Parallel Computing
  • Parallel Processing
  • Recovery
  • Software Development
  • Systems Science

Fields of Study

  • Computer science

Readers

  • Parallel and Distributed Computing.