Progressive Retry for Software Failure Recovery in Message-Passing Applications

Abstract

In this paper, we describe a method of execution retry for bypassing software faults in message-passing applications. Based on the techniques of checkpointing and message logging, we demonstrate the use of message replaying and message reordering as two mechanisms for achieving localized and fast recovery. Our approach gradually increases the rollback distance and the number of affected processes when a previous retry fails, and is therefore named progressive retry. An implementation as reusable modules to provide low-cost application-level software fault tolerance is described. Examples from our experience with telecommunications software systems are given to illustrate the benefits of the scheme.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 01, 1993
Accession Number
ADA274289

Entities

People

  • Chandra Kintala
  • W. Kent Fuchs
  • Yennun Huang
  • Yi-min Wang

Organizations

  • University of Illinois Urbana–Champaign

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Application Software
  • Availability
  • Boundaries
  • Channel Allocation
  • Communication Channels
  • Communication Systems
  • Computer Programming
  • Computers
  • Debugging
  • Environment
  • Fault Tolerance
  • Intervals
  • Overload
  • Recovery
  • Sequences
  • Software Testing

Fields of Study

  • Computer science
  • Engineering

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Parallel and Distributed Computing.