Horizontal Fault Tolerance in a Fully Distributed Loosely Coupled Environment
Abstract
The increasing use of local area networks to divide up the processing power once allocated to a single central processor has a side benefit which allows for the implementation of levels of fault tolerance at minimal cost. With a central processor, hardware replication is mandatory to continue processing in the face of hardware failures. Otherwise, processing must generally halt for a period of hardware repair. A local area network already contains replicated hardware, along with software to support communications over links connecting the individual nodes. The existence of duplicated hardware, and independent processing ability within each node, allows for a concentration on software support for fault tolerance. Hardware replication can be limited to such areas as network topologies which employ multiple links among the nodes. This research concentrates on software approaches to fault tolerance in a (loosely coupled) network environment. Current approaches are studied. These turn out to emphasize special purpose languages and operating systems designed to allow for transparent distribution of tasks amongst the nodes. The failure scenarios under which faults will be masked varies widely. Specifically, this research develops a set of language and operating system protocols to implement a level of fault tolerance. This Fault Tolerant Monitor (FTM) system is layered above the operating system.
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 01, 1990
- Accession Number
- ADA227972
Entities
People
- Peter Schiavi
Organizations
- Air Force Institute of Technology