Lazy Checkpoint Coordination for Bounding Rollback Propagation
Abstract
Independent checkpointing allows maximum process autonomy but suffers from potential domino effects. Coordinated checkpointing eliminates the domino effect by sacrificing a certain degree of process autonomy. In this paper, we propose the technique of lazy checkpoint coordination which preserves process autonomy while employing communication-induced checkpoint coordination for bounding rollback propagation. The introduction of the notion of laziness allows a flexible tradeoff between the cost for checkpoint coordination and the average rollback distance. Worst-case overhead analysis provides a means for estimating the extra checkpoint overhead. Communication trace-driven simulation for several parallel programs is used to evaluate the benefits of the proposed scheme for real applications.... Fault tolerance, Independent checkpointing, Checkpoint coordination, Rollback recovery.
Document Details
- Document Type
- Technical Report
- Publication Date
- May 28, 1993
- Accession Number
- ADA259257
Entities
People
- W. Kent Fuchs
- Yi-min Wang
Organizations
- University of Illinois Urbana–Champaign