Abstractions for Fault Tolerance in Distributed Systems.

Abstract

This paper describes abstractions that are useful in implementing fault-tolerant and distributed systems. The same abstractions serve both fault tolerance and distribution, supporting our belief that the two concerns are not separable. The abstractions we present have a somewhat different flavor from abstractions prevalent in sequential and concurrent programming, which encapsulate state information and/or provide operations to manipulate that state (e.g. a stack or a monitor). Our abstractions are best thought of as properties of protocols or control. Section 2 describes abstractions of processors that can fail and section 3 reviews some fundamentals for coping with failures. Section 4 discusses the state machine approach, a general way to construct fault-tolerant distributed computing services. The state machine approach motivates two abstractions: Agreement and Order. Section 5 discusses a second approach for constructing a fault-tolerant computing service and this motivates two more abstractions: Failure Detection and Stable Storage. Implementing abstractions by exploiting hardware is discussed in section 6. Section 7 discusses related work. Keywords: Fault tolerance; Distributed computing; State machines; Failure detection; Fail-stop processor.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Apr 01, 1986
Accession Number
ADA166549

Entities

People

  • Fred B. Schneider

Organizations

  • Cornell University

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Abstracts
  • Agreements
  • Classification
  • Computer Programming
  • Computer Science
  • Computers
  • Computing System Architectures
  • Damage Detection
  • Detection
  • Distributed Computing
  • Fault Tolerance
  • Language
  • Military Research
  • New York
  • Security
  • Specifications
  • Universities

Fields of Study

  • Computer science
  • Engineering

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Parallel and Distributed Computing.