Fault-Tolerant Wait-Free Shared Objects
Abstract
A concurrent system consists of processes communicating via shared objects, such as shared variables, queues, etc. The concept of wait-freedom was introduced to cope with process failures: each process that accesses a wait-free object is guaranteed to get a response even if all the other processes crash. But what if these wait-free objects themselves fail? For example, if a wait-free object crashes , all the processes that access that object are prevented from making progress. In this paper, we introduce the concept of fault-tolerant wait-free objects, and study the problem of implementing them. We give a universal method to construct fault-tolerant wait-free objects, for all types of responsive failures (including one in which faulty objects may lie ). In sharp contrast, we prove that many common and interesting object types (such as queues, sets, and test and set) have no fault-tolerant wait-free implementations even under the most benign of the non-responsive types of failure. We also introduce several concepts and techniques that are central to the design of fault-tolerant concurrent systems: the concepts of self-implementation and graceful degradation, and techniques to automatically increase the fault-tolerance of implementations. We prove matching lower bounds on the resource complexity of most of our algorithms.
Document Details
- Document Type
- Technical Report
- Publication Date
- Apr 01, 1992
- Accession Number
- ADA250303
Entities
People
- Prasad Jayanti
- Sam Toueg
- Tushar D. Chandra
Organizations
- Cornell University