Failure-Tolerant Parallel Programming and its Supporting System Architecture

Abstract

The state-of-art in software validation as well as the continuing growth of the size and complexity of software subsystems, makes extra costs paid for software error tolerance more than justified. A program is which software redundancy is incorporated i.3., a program in which procedures for runtime validation and recovery are explicitly specified, is generally called a failure- tolerance program. One problem in failure-tolerant programming, which could be particularly serious in real-time computing environments, is the program execution time increased due to incorporation of validation and recovery procedures. This paper introduces an approach to the solution, called the failure-tolerant parallel programming. The essence of this approach is to maximally overlap main-stream computation with redundant computation oriented for validation and recovery. Subsequently a model system architecture tailored for efficient execution of failure-tolerant parallel programs is described. It is of highly general and modular nature and contains a novel memory subsystem named the duplex memory. Directions of further researches on program structuring and expansion of the model architecture are also indicated. (Author)

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1976
Accession Number
ADA029926

Entities

People

  • C. V. Ramamoorthy
  • K. H. Kim

Organizations

  • University of Southern California

Tags

Communities of Interest

  • Advanced Electronics
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Acceptance Tests
  • Access Time
  • Computations
  • Computer Programming
  • Computer Programs
  • Computers
  • Computing System Architectures
  • Content Addressable Memory
  • Damage Detection
  • Electronic Switching
  • Environment
  • Language
  • Programming Languages
  • Recovery
  • Redundancy
  • Reliability
  • Validation

Fields of Study

  • Computer science
  • Engineering

Readers

  • Parallel and Distributed Computing.
  • Software Engineering.