Fault Tolerance in Critical Information Systems

Abstract

Critical infrastructure applications provide services upon which society depends heavily, such applications require constant, dependable operation in the face of various failures, natural disasters, and other disruptive events that might cause a loss of service. These applications are themselves dependent on distributed information systems for all aspects of their operation, so survivability of these critical information systems is an important issue. Survivability is the ability of a system to continue to provide service, though possibly alternate or degraded, in the face of various types of failure and disruption. A fundamental mechanism by which survivability can be achieved in critical information systems is fault tolerance. Much of the literature on fault-tolerant distributed systems focuses on tolerance of local faults by detecting and masking the effects of those faults. I describe a direction for fault tolerance in the face of non-local faults faults whose effects have significant non-local impact, sometimes widespread and sometimes catastrophic where often the effects of these faults cannot be masked using available resources. The goal is to recognize these non-local faults through detection and analysis, then to provide continued service (possibly alternate or degraded) by reconfiguring the system in response to these faults.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 2001
Accession Number
ADA479834

Entities

People

  • Matthew C. Elder

Organizations

  • University of Virginia

Tags

Communities of Interest

  • Cyber
  • Energy and Power Technologies
  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Application Software
  • Communication Systems
  • Complex Systems
  • Computer Programming
  • Computer Science
  • Computers
  • Control Systems
  • Data Centers
  • Databases
  • Fault Tolerance
  • Geographic Regions
  • Information Systems
  • National Security
  • North America
  • Software Development
  • Systems Engineering
  • United States

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Plasma Physics / Magnetohydrodynamics
  • Structural Health Monitoring of Composite Structures.