Fault-Tolerant Computing: An Overview

Abstract

The purpose of this report is to outline the major concepts and developments in the area of fault tolerant computing. Both hardware and software fault tolerance issues are addressed. The topics covered include module function and system-level fault detection methods, redundancy and reconfiguration strategies, valid fault models, and coding and checking in computer systems. Software fault tolerance methods such as recovery blocks, design diversity, and checkpointing and recovery are also discussed. Major issues in modeling and evaluation of fault-tolerant systems are outlined. The design of two successful commercial systems is discussed.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 1991
Accession Number
ADA238266

Entities

People

  • J. H. Patel
  • P. Banerjee
  • R. Horst
  • Rishabh Iyer
  • W. Kent Fuchs

Organizations

  • University of Illinois Urbana–Champaign

Tags

Communities of Interest

  • Energy and Power Technologies
  • Engineered Resilient Systems
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Application Software
  • Computer Programming
  • Computers
  • Detection
  • Fault Tolerance
  • Fault Tolerant Computing
  • Manufacturing
  • Markov Processes
  • Measurement
  • Operating Systems
  • Operations Research
  • Random Variables
  • Reliability
  • Semiconductor Manufacturing
  • Simulations
  • Simulators
  • Software Development

Fields of Study

  • Computer science
  • Engineering

Readers

  • Parallel and Distributed Computing.
  • Systems Analysis and Design