A Survey of Rollback-Recovery Protocols in Message-Passing Systems,

Abstract

The problem of rollback-recovery in message-passing systems has undergone extensive study. In this survey, we review rollback-recovery techniques that do not require special language constructs, and classify them into two primary categories. Checkpoint-based rollback-recovery relies solely on checkpointed states for system state restoration. Depending on when checkpoints are taken, existing approaches can be divided into uncoordinated checkpointing, coordinated checkpointing and communication-induced checkpointing. Log-based rollback-recovery uses checkpointing and message logging. The logs enable the recovery protocol to reconstruct the states that are not checkpointed. There are three different log-based approaches, namely, pessimistic logging, optimistic logging and causal logging. We identify a set of desirable properties of rollback-recovery protocols, and compare different approaches with respect to these properties. Log-based rollback-recovery protocols generally rely on the assumption of piecewise determinism and pay additional overhead to allow faster output commits and more localized recovery. We present research issues under each approach, and review existing solutions to address them. We also present implementation issues of checkpointing and message logging.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 03, 1996
Accession Number
ADA317250

Entities

People

  • D. B. Johnson
  • E. N. Elnozahy
  • Y.-M. Wang

Organizations

  • Carnegie Mellon University

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Application Software
  • Clocks
  • Communication Channels
  • Communication Systems
  • Compilers
  • Computations
  • Computer Science
  • Computers
  • Environment
  • Explosives Initiators
  • Failure Mode And Effect Analysis
  • Instructions
  • Language
  • Magnetic Disks
  • Operating Systems
  • Time Intervals

Fields of Study

  • Computer science

Readers

  • Parallel and Distributed Computing.