An Evolutionary Approach to Concurrent Checkpointing

Abstract

This paper describes an evolutionary approach to establishing a consistent global recovery line for concurrent processes. Unlike globally synchronized schemes, our approach uses no agreement protocols and thus no rounds of messages to decide upon a recovery line. Unlike logging-based schemes, our approach neither stores the messages exchanged between concurrent processes, nor constructs message dependence graphs to determine a recovery line. In contrast to communication synchronized schemes, our technique reduces overhead by not always synchronizing computation with checkpointing and by allowing a potentially inconsistent recovery line temporarily. Evolutionary concurrent checkpointing periodically starts a checkpointing session by checkpointing each process locally. As the checkpointing session progresses, the initial checkpoints are updated according to the communication between the concurrent processes. This local checkpoint updating guarantees that the recovery line evolves into a consistent line. Evolutionary concurrent checkpointing can be applied to message-based multicomputer systems, shared virtual memory systems, and shared memory multiprocessors. We evaluate the performance of our approach using execution traces from a hypercube multicomputer and a shared-memory multiprocessor. fault tolerant computing, checkpointing, and rollback error recovery.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1992
Accession Number
ADA251926

Entities

People

  • Bob Janssens
  • Junsheng Long
  • W. Kent Fuchs

Organizations

  • University of Illinois Urbana–Champaign

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Access Time
  • Algorithms
  • Broadcasting
  • Clocks
  • Communication Channels
  • Computations
  • Computer Programming
  • Computer Science
  • Computers
  • Convergence
  • Evolutionary Algorithms
  • Fault Tolerance
  • Fault Tolerant Computing
  • Frequency
  • Parallel Computing
  • Parallel Processing
  • Simulators

Fields of Study

  • Computer science
  • Engineering

Readers

  • Parallel and Distributed Computing.