Lazy Checkpoint Coordination for Bounding Rollback Propagation

Abstract

Independent checkpointing allows maximum process autonomy but suffers from potential domino effects. Coordinated checkpointing eliminates the domino effect by sacrificing a certain degree of process autonomy. In this paper, we propose the technique of lazy checkpoint coordination which preserves process autonomy while employing communication-induced checkpoint coordination for bounding rollback propagation. The introduction of the notion of laziness allows a flexible tradeoff between the cost for checkpoint coordination and the average rollback distance. Worst-case overhead analysis provides a means for estimating the extra checkpoint overhead. Communication trace-driven simulation for several parallel programs is used to evaluate the benefits of the proposed scheme for real applications.... Fault tolerance, Independent checkpointing, Checkpoint coordination, Rollback recovery.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 28, 1993
Accession Number
ADA259257

Entities

People

  • W. Kent Fuchs
  • Yi-min Wang

Organizations

  • University of Illinois Urbana–Champaign

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Computers
  • Consistency
  • Contrast
  • Cooperation
  • Demographic Cohorts
  • Distributed Computing
  • Electronic Mail
  • Fault Tolerance
  • Guarantees
  • High Performance Computing
  • Illinois
  • Intervals
  • Language
  • Recovery
  • Simulations

Fields of Study

  • Computer science

Readers

  • Parallel and Distributed Computing.