Progressive Retry for Software Error Recovery in Distributed Systems

Abstract

In this paper, we describe a method of execution retry for bypassing software faults based on check pointing, rollback, message reordering and replaying. We demonstrate how rollback techniques, previously developed for transient hardware failure recovery, can also be used to recover from software errors by exploiting message reordering to bypass software faults. Our approach intentionally increases the degree of nondeterminism and the scope of rollback when a previous retry fails. Examples from our experience with telecommunications software systems illustrate the benefits of the scheme.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 20, 1993
Accession Number
ADA266859

Entities

People

  • W. Kent Fuchs
  • Yennum Huang
  • Yi-min Wang

Organizations

  • University of Illinois Urbana–Champaign

Tags

DTIC Thesaurus Topics

  • Availability
  • Boundaries
  • Channel Allocation
  • Communication Systems
  • Computer Programming
  • Computers
  • Debugging
  • Engineering
  • Fault Tolerance
  • Fault Tolerant Computing
  • Intervals
  • Long Life
  • Message Systems
  • Operating Systems
  • Recovery
  • Software Development
  • Software Testing

Fields of Study

  • Computer science
  • Engineering

Readers

  • Computer Science/Computer Engineering/Data Science/Digital Signal Processing.
  • Parallel and Distributed Computing.
  • Systems Analysis and Design