A Survey of Rollback-Recovery Protocols in Message-Passing Systems

Abstract

This survey covers rollback-recovery techniques that do not require special language constructs. In the first part of the survey we classify rollback-recovery protocols into checkpoint-based and log-based. Checkpoint-based protocols rely solely on checkpointing for system state restoration. Checkpointing can be coordinated, uncoordinated, or communication-induced. Log-based protocols combine checkpointing with logging of nondeterministic events, encoded in tuples called determinants. Depending on how determinants are logged, log-based protocols can be pessimistic, optimistic, or causal. Throughout the survey, we highlight the research issues that are at the core of rollback recovery and present the solutions that currently address them. We also compare the performance of different rollback-recovery protocols with respect to a series of desirable properties and discuss the issues that arise in the practical implementations of these protocols.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 1999
Accession Number
ADA375892

Entities

People

  • David B. Johnson
  • Lorenzo Alvisi
  • Mootaz Elnozahy
  • Yi-min Wang

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Application Software
  • Communication Channels
  • Communication Systems
  • Computer Networks
  • Computer Programming
  • Computer Science
  • Computers
  • Distributed Computing
  • Failure Mode And Effect Analysis
  • Fault Tolerance
  • Fault Tolerant Computing
  • High Performance Computing
  • Language
  • Operating Systems
  • Parallel Computing
  • Parallel Processing
  • Programming Languages

Fields of Study

  • Computer science

Readers

  • Artificial Intelligence
  • Database Systems and Applications
  • Organizational Process Management (OPM).