Failure-Tolerant Parallel Programming and its Supporting System Architecture
Abstract
The state-of-art in software validation as well as the continuing growth of the size and complexity of software subsystems, makes extra costs paid for software error tolerance more than justified. A program is which software redundancy is incorporated i.3., a program in which procedures for runtime validation and recovery are explicitly specified, is generally called a failure- tolerance program. One problem in failure-tolerant programming, which could be particularly serious in real-time computing environments, is the program execution time increased due to incorporation of validation and recovery procedures. This paper introduces an approach to the solution, called the failure-tolerant parallel programming. The essence of this approach is to maximally overlap main-stream computation with redundant computation oriented for validation and recovery. Subsequently a model system architecture tailored for efficient execution of failure-tolerant parallel programs is described. It is of highly general and modular nature and contains a novel memory subsystem named the duplex memory. Directions of further researches on program structuring and expansion of the model architecture are also indicated. (Author)
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 1976
- Accession Number
- ADA029926
Entities
People
- C. V. Ramamoorthy
- K. H. Kim
Organizations
- University of Southern California