Reconfiguration in Robust Distributed Real-Time Systems Based on Global Checkpoints
Abstract
Fast, ultra-reliable, real-time computing is fundamental in today's weapons system. Increased system throughput and reliability can be achieved by utilizing distributed systems in which a single application program executes on multiple processors, connected to a network. The distributed nature of such systems make it possible to tolerate failures and react to overloads without the application level performance degrading unacceptably. Fault tolerance in these systems typically involves fault detection and recovery. Repair following failure involves smooth integration of the repaired processor and subsequent reconfiguration. These actions must take place transparently, that is without the application program noticing it. Therefore, sufficient information must be maintained through the use of checkpointing to describe the state of the system at any time and ensure correct operation after failure/repair. This thesis investigates a possible framework for achieving a fault-tolerant real time distributed system which provides transparent function-to-function message passing, status monitoring using periodic health messages and maintains a globally consistent system state by carrying out independent checkpointing procedures. The proposed scheme is simulated using concurrent Ada processing for a four mode, twelve function distributed system.
Document Details
- Document Type
- Technical Report
- Publication Date
- Dec 01, 1991
- Accession Number
- ADA245615
Entities
People
- Ronnie D. Puett
Organizations
- Naval Postgraduate School