Progressive Retry for Software Error Recovery in Distributed Systems

Abstract

In this paper, the authors describe a method of execution retry for bypassing software errors based on checkpointing, rollback, message reordering and replaying. They demonstrate how rollback techniques, previously developed for transient hardware failure recovery, can also be used to recover from software faults by exploiting message reordering to bypass software errors. Their approach intentionally increases the degree of nondeterminism and the scope of rollback when a previous retry fails. Examples from their experience with telecommunications software systems illustrate the benefits of the scheme.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1993
Accession Number
ADA260075

Entities

People

  • W. Kent Fuchs
  • Yennun Huang
  • Yi-min Wang

Organizations

  • University of Illinois Urbana–Champaign

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Access Time
  • Application Software
  • Boundaries
  • Channel Allocation
  • Communication Systems
  • Computer Programming
  • Debugging
  • Electronic Mail
  • Fault Tolerance
  • High Performance Computing
  • Intervals
  • Long Life
  • Message Processing
  • Recovery
  • Sequences
  • Software Development
  • Software Testing

Fields of Study

  • Computer science
  • Engineering

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Parallel and Distributed Computing.