Abstractions for Fault Tolerance in Distributed Systems.

Abstract

This paper describes abstractions that are useful in implementing fault-tolerant and distributed systems. The same abstractions serve both fault tolerance and distribution, supporting our belief that the two concerns are not separable. The abstractions we present have a somewhat different flavor from abstractions prevalent in sequential and concurrent programming, which encapsulate state information and/or provide operations to manipulate that state (e.g. a stack or a monitor). Our abstractions are best thought of as properties of protocols or control. Section 2 describes abstractions of processors that can fail and section 3 reviews some fundamentals for coping with failures. Section 4 discusses the state machine approach, a general way to construct fault-tolerant distributed computing services. The state machine approach motivates two abstractions: Agreement and Order. Section 5 discusses a second approach for constructing a fault-tolerant computing service and this motivates two more abstractions: Failure Detection and Stable Storage. Implementing abstractions by exploiting hardware is discussed in section 6. Section 7 discusses related work. Keywords: Fault tolerance; Distributed computing; State machines; Failure detection; Fail-stop processor.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Apr 01, 1986
Accession Number: ADA166549

Entities

People

Fred B. Schneider

Organizations

Cornell University

Abstractions for Fault Tolerance in Distributed Systems.

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers