Lazy Checkpoint Coordination for Bounding Rollback Propagation

Abstract

In this paper, we propose the technique of lazy checkpoint coordination which preserves process autonomy while employing communication- induced checkpoint coordination for bounding rollback propagation. The notion of laziness is introduced to control the coordination frequency and allow a flexible trade-off between the cost of checkpoint coordination and the average rollback distance. Worst-case overhead analysis provides a means for estimating the extra checkpoint overhead. Communication trace-driven simulation for several parallel programs is used to evaluate the benefits of the proposed scheme.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1993
Accession Number
ADA267135

Entities

People

  • W. Kent Fuchs
  • Yi-min Wang

Organizations

  • University of Illinois Urbana–Champaign

Tags

Communities of Interest

  • Materials and Manufacturing Processes
  • Space

DTIC Thesaurus Topics

  • Algorithms
  • Autonomy
  • Computers
  • Demographic Cohorts
  • Distributed Computing
  • Engineering
  • Fault Tolerance
  • Fault Tolerant Computing
  • Illinois
  • Information Processing
  • Intervals
  • Message Systems
  • Operating Systems
  • Parallel Computing
  • Recovery
  • Simulations
  • Software Development

Fields of Study

  • Computer science
  • Engineering

Readers

  • Computer Vision.
  • Parallel and Distributed Computing.