Progressive Retry for Software Error Recovery in Distributed Systems
Abstract
In this paper, we describe a method of execution retry for bypassing software faults based on check pointing, rollback, message reordering and replaying. We demonstrate how rollback techniques, previously developed for transient hardware failure recovery, can also be used to recover from software errors by exploiting message reordering to bypass software faults. Our approach intentionally increases the degree of nondeterminism and the scope of rollback when a previous retry fails. Examples from our experience with telecommunications software systems illustrate the benefits of the scheme.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jul 20, 1993
- Accession Number
- ADA266859
Entities
People
- W. Kent Fuchs
- Yennum Huang
- Yi-min Wang
Organizations
- University of Illinois Urbana–Champaign