Manual and Compiler Assisted Methods For Generating Fault-Tolerant Parallel Programs.

Abstract

We have developed an automated, compile time approach to generating error-detecting parallel programs. The compiler is used to identify statements implementing affine transformations within the program and automatically insert code for computing, manipulating, and comparing checksums in order to check the correctness of the code implementing affine transformations. Statements which do not implement affine transformations are checked by duplication. Checksums are reused from one loop to the next if this is possible, rather than recomputing checksums for every statement. A global dataflow analysis is performed in order to determine points at which checksums need to be recomputed. We also use a novel method of specifying the data distributions of the check data using directives provided by the High Performance Fortran (HPF) standard so that the computations on the original data and the corresponding check computations are performed on different processors. Results are presented on an Intel Paragon distributed memory multicomputer.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 1995
Accession Number
ADA302812

Entities

People

  • Amber Roy-chowdhury

Organizations

  • University of Illinois Urbana–Champaign

Tags

Communities of Interest

  • C4I
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Abstracts
  • Coding
  • Compilers
  • Computations
  • Computer Programming
  • Computers
  • Decoding
  • Differential Equations
  • Directives
  • Engineering
  • Fault Tolerance
  • Notation
  • Parallel Computing
  • Parallel Processing
  • Partial Differential Equations
  • Standards
  • Two Dimensional

Fields of Study

  • Computer science
  • Engineering

Readers

  • Computer Programming and Software Development.
  • Computer Science.
  • Parallel and Distributed Computing.