Fault-Tolerance in Distributed and Multiprocessor Real-Time Systems

Abstract

New schemes for fault-tolerance in multiprocessor and distributed systems have been developed in the following areas: We have investigated a number of fault tolerance schemes to evaluate performance, reliability, and availability trade-offs. Fault tolerance schemes are being developed for various fault models (tail-stop model, fail-slow model, and arbitrary failure model) and application areas (applications that are to provide results at the end of computation and applications that are long-running but should also provide results during computation). In the area of software-implemented fault tolerance, we are studying approaches for providing user transparent mechanisms for fault tolerance to design and implement a software library to which the user can link existing application software to achieve the desired level of fault tolerance. We are developing a new tool (Reliable Architecture Characterization Tool--REACT) for evaluating the reliability and availability of distributed multiprocessor systems using various fault tolerance techniques. This tool will facilitate evaluation of the fault tolerance schemes that we develop.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 31, 1993
Accession Number
ADA279896

Entities

People

  • Dhiraj K. Pradhan

Organizations

  • Texas Engineering Experiment Station

Tags

Communities of Interest

  • Materials and Manufacturing Processes
  • Space

DTIC Thesaurus Topics

  • Algorithms
  • Application Software
  • Computer Architecture
  • Computer Programming
  • Computer Science
  • Computers
  • Engineering
  • Failure Mode And Effect Analysis
  • Fault Tolerance
  • High Reliability
  • Measurement
  • Operating Systems
  • Parallel Computing
  • Parallel Processing
  • Simulations
  • Systems Engineering
  • Test And Evaluation

Fields of Study

  • Computer science
  • Engineering

Readers

  • Computational Modeling and Simulation
  • Fault Tolerant Diagnosis of Black and White Balloon Isolation Tests Using ¥.
  • Parallel and Distributed Computing.