System Level Fault Tolerance in Parallel and Distributed Computing Systems

Abstract

The major thrust of our effort was focused on the theory and practice of responsive (fault-tolerant, real-time) computing in parallel and distributed processing environments. New efficient methods of system testing have been developed which shorten a multiprocessor testing time by orders of magnitude and, therefore, can be used at system booting (previous techniques were prohibitively long. A new design framework for responsive computing was designed and is being implemented for validation. This framework for responsive computing was designed and is being implemented for validation. This framework is based on consensus which can be used to provide synchronization, reliable communication, fault diagnosis, checkpointing and even scheduling in multiprocessor environments. We have formalized and quantified the space-time tradeoff for efficient fault recovery. The system model is a graph, and we were especially successful in analysis of meshes and hypercubes. We developed a new method called naturally redundant algorithms which allows efficient implementation of application-specific techniques

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 31, 1993
Accession Number
ADA279664

Entities

Organizations

  • University of Texas at Austin

Tags

Communities of Interest

  • Energy and Power Technologies
  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Algorithms
  • Communication Systems
  • Computer Networks
  • Computer Science
  • Computers
  • Distributed Computing
  • Electrical Engineering
  • Engineering
  • Fault Tolerance
  • Fault Tolerant Computing
  • High Performance Computing
  • Military Research
  • New York
  • Operations Research
  • Parallel Computing
  • Systems Engineering
  • United States

Fields of Study

  • Computer science
  • Engineering

Readers

  • Distributed Systems and Data Platform Development
  • Parallel and Distributed Computing.

Technology Areas

  • Space