Fault-Tolerance in Distributed and Multiprocessor Real-Time Systems
Abstract
New schemes for fault-tolerance in multiprocessor and distributed systems have been developed in the following areas: We have investigated a number of fault tolerance schemes to evaluate performance, reliability, and availability trade-offs. Fault tolerance schemes are being developed for various fault models (tail-stop model, fail-slow model, and arbitrary failure model) and application areas (applications that are to provide results at the end of computation and applications that are long-running but should also provide results during computation). In the area of software-implemented fault tolerance, we are studying approaches for providing user transparent mechanisms for fault tolerance to design and implement a software library to which the user can link existing application software to achieve the desired level of fault tolerance. We are developing a new tool (Reliable Architecture Characterization Tool--REACT) for evaluating the reliability and availability of distributed multiprocessor systems using various fault tolerance techniques. This tool will facilitate evaluation of the fault tolerance schemes that we develop.
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 31, 1993
- Accession Number
- ADA279896
Entities
People
- Dhiraj K. Pradhan
Organizations
- Texas Engineering Experiment Station