Progressive Retry for Software Failure Recovery in Message-Passing Applications
Abstract
In this paper, we describe a method of execution retry for bypassing software faults in message-passing applications. Based on the techniques of checkpointing and message logging, we demonstrate the use of message replaying and message reordering as two mechanisms for achieving localized and fast recovery. Our approach gradually increases the rollback distance and the number of affected processes when a previous retry fails, and is therefore named progressive retry. An implementation as reusable modules to provide low-cost application-level software fault tolerance is described. Examples from our experience with telecommunications software systems are given to illustrate the benefits of the scheme.
Document Details
- Document Type
- Technical Report
- Publication Date
- Oct 01, 1993
- Accession Number
- ADA274289
Entities
People
- Chandra Kintala
- W. Kent Fuchs
- Yennun Huang
- Yi-min Wang
Organizations
- University of Illinois Urbana–Champaign