Dealing with Failures During Failure Recovery of Distributed Systems

Abstract

One of the characteristics of autonomic systems is self-recovery from failures. Self-recovery can be achieved through sensing failures, planning for recovery and executing the recovery plan to bring the system back to a normal state. For various reasons, however, additional failures are possible during the process of recovering from the initial failure. Handling such secondary failures is important because they can cause the original recovery plan to fail and can leave the system in a complicated state that is worse than before. In this paper, techniques are identified to preserve consistency while dealing with such failures that occur during failure recovery.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2006
Accession Number
ADA448479

Entities

People

  • Alexander L. Wolf
  • Dennis M. Heimbigner
  • Naveed Arshad

Organizations

  • University of Colorado Boulder

Tags

Communities of Interest

  • Biomedical
  • Sensors

DTIC Thesaurus Topics

  • Abstracts
  • Acceptance Tests
  • Colorado
  • Computer Science
  • Computers
  • Damage Detection
  • Databases
  • Failed States
  • Fault Tolerance
  • Information Operations
  • Language
  • Monitoring
  • Packet Loss
  • Recovery
  • Reliability
  • Universities

Fields of Study

  • Computer science
  • Engineering

Readers

  • Emergency Management and Homeland Security.
  • Inertial Navigation Systems.
  • Systems Analysis and Design

Technology Areas

  • Autonomy
  • Autonomy - Autonomous System Control