Progressive Retry for Software Error Recovery in Distributed Systems
Abstract
In this paper, the authors describe a method of execution retry for bypassing software errors based on checkpointing, rollback, message reordering and replaying. They demonstrate how rollback techniques, previously developed for transient hardware failure recovery, can also be used to recover from software faults by exploiting message reordering to bypass software errors. Their approach intentionally increases the degree of nondeterminism and the scope of rollback when a previous retry fails. Examples from their experience with telecommunications software systems illustrate the benefits of the scheme.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 1993
- Accession Number
- ADA260075
Entities
People
- W. Kent Fuchs
- Yennun Huang
- Yi-min Wang
Organizations
- University of Illinois Urbana–Champaign