Checkpoint Space Reclamation for Independent Checkpointing in Message- Passing Systems

Abstract

The main disadvantages of independent checkpointing in message-passing systems are the possible domino effect and the associated storage space overhead for maintaining multiple checkpoints. In most previous research on checkpointing and recovery, it has been assumed that only the checkpoints older than the global recovery line can be discarded. In this paper, we generalize the notion of a recovery line to that of a potential recovery line. Only the checkpoints belonging to at least one of the potential recovery lines can not be discarded. By using the model of maximum-sized antichains on a partially ordered set, an efficient algorithm is developed for finding all non-discardable checkpoints and an upper bound on the number of non-discardable checkpoints is derived. Communication trace-driven simulation for several parallel programs is used to show the benefits of the proposed algorithm for real applications.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1992
Accession Number
ADA251923

Entities

People

  • In-jen Lin
  • Pi-yu Chung
  • W. Kent Fuchs
  • Yi-min Wang

Organizations

  • University of Illinois Urbana–Champaign

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Combinatorial Analysis
  • Computations
  • Computer Programming
  • Computers
  • Consistency
  • Electronic Mail
  • Fault Tolerance
  • Fault Tolerant Computing
  • High Performance Computing
  • Operating Systems
  • Parallel Computing
  • Parallel Processing
  • Reclamation
  • Recovery
  • Simulations
  • Software Development

Readers

  • Parallel and Distributed Computing.
  • Regression Analysis.

Technology Areas

  • Space