Reconfiguration in Robust Distributed Real-Time Systems Based on Global Checkpoints

Abstract

Fast, ultra-reliable, real-time computing is fundamental in today's weapons system. Increased system throughput and reliability can be achieved by utilizing distributed systems in which a single application program executes on multiple processors, connected to a network. The distributed nature of such systems make it possible to tolerate failures and react to overloads without the application level performance degrading unacceptably. Fault tolerance in these systems typically involves fault detection and recovery. Repair following failure involves smooth integration of the repaired processor and subsequent reconfiguration. These actions must take place transparently, that is without the application program noticing it. Therefore, sufficient information must be maintained through the use of checkpointing to describe the state of the system at any time and ensure correct operation after failure/repair. This thesis investigates a possible framework for achieving a fault-tolerant real time distributed system which provides transparent function-to-function message passing, status monitoring using periodic health messages and maintains a globally consistent system state by carrying out independent checkpointing procedures. The proposed scheme is simulated using concurrent Ada processing for a four mode, twelve function distributed system.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 1991
Accession Number
ADA245615

Entities

People

  • Ronnie D. Puett

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Biomedical
  • Energy and Power Technologies
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Application Software
  • Computers
  • Detection
  • Fault Tolerance
  • Front End Processors
  • Identification
  • Message Processing
  • Monitoring
  • Operating Systems
  • Overload
  • Recovery
  • Resource Management
  • Simulations
  • Time Intervals
  • United States
  • United States Government

Fields of Study

  • Computer science
  • Engineering

Readers

  • Parallel and Distributed Computing.
  • Structural Health Monitoring of Composite Structures.