System Architecture for Fault-Tolerant Processes in Distributed Systems

Abstract

The primary focus of this report is on system architectures and protocols for building fault-tolerant distributed systems. It addresses algorithms and protocols in four different areas: fault-diagnosis, error recovery in replicated systems, error recovery based on self-stabilization, and the use of masking redundancy in replicated systems using agreement protocols. This report is a collection of six technical papers that present the results obtained in this area. The first paper describes a system architecture for building resilient processes using replication and check pointing. It describes the protocols for process replication management. The second paper presents an agreement protocol which provides the same view of the computation state to each correctly functioning copy of the process. The third paper presents a protocol for self-stabilization in binary trees. This protocol is a generalization of one of Dijkstra's protocols and for normal operations is sufficient to guarantee recovery from any erroneous state. The fourth paper presents a protocol for detecting the termination of a set of cooperating communicating processes. The last two papers address the problems related to fault-diagnosis in interconnected systems. The first presents a survey of the various fault- diagnosis algorithms based on the model proposed by Preparata, Metze & Chen (PMC Model). The second presents some results in direction of designing more efficient fault-diagnosis algorithms. Keywords: Computer architecture; Fault tolerant computing.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 1989
Accession Number
ADA208233

Entities

People

  • Anand Tripathi
  • S. Azadegan
  • S. Ranka
  • Sheng Dong
  • Vijay Raghavan

Organizations

  • University of Minnesota

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Availability
  • Command And Control
  • Communication Systems
  • Computational Complexity
  • Computer Science
  • Computers
  • Computing System Architectures
  • Construction
  • Detection
  • Distributed Computing
  • Information Processing
  • Minnesota
  • Probability
  • Security
  • Trees (Data Structures)
  • Universities

Fields of Study

  • Computer science
  • Engineering

Readers

  • Database Systems and Applications
  • Fault Tolerant Diagnosis of Black and White Balloon Isolation Tests Using ¥.
  • Mathematical Modeling and Probability Theory.