An Evolutionary Approach to Concurrent Checkpointing
Abstract
This paper describes an evolutionary approach to establishing a consistent global recovery line for concurrent processes. Unlike globally synchronized schemes, our approach uses no agreement protocols and thus no rounds of messages to decide upon a recovery line. Unlike logging-based schemes, our approach neither stores the messages exchanged between concurrent processes, nor constructs message dependence graphs to determine a recovery line. In contrast to communication synchronized schemes, our technique reduces overhead by not always synchronizing computation with checkpointing and by allowing a potentially inconsistent recovery line temporarily. Evolutionary concurrent checkpointing periodically starts a checkpointing session by checkpointing each process locally. As the checkpointing session progresses, the initial checkpoints are updated according to the communication between the concurrent processes. This local checkpoint updating guarantees that the recovery line evolves into a consistent line. Evolutionary concurrent checkpointing can be applied to message-based multicomputer systems, shared virtual memory systems, and shared memory multiprocessors. We evaluate the performance of our approach using execution traces from a hypercube multicomputer and a shared-memory multiprocessor. fault tolerant computing, checkpointing, and rollback error recovery.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 1992
- Accession Number
- ADA251926
Entities
People
- Bob Janssens
- Junsheng Long
- W. Kent Fuchs
Organizations
- University of Illinois Urbana–Champaign