Exploiting Data-Flow for Fault-Tolerance in a Wide-Area Parallel System

Abstract

Wide-area parallel processing systems will soon be available to researchers to solve a range of problems. In these systems, it is certain that host failures and other faults will be a common occurrence. Unfortunately, most parallel processing systems have not been designed with fault-tolerance in mind. Mentat is a high-performance objec t-oriented parallel processing system that is based on an extension of the data-flow model. The functional nature of data-flow enables both parallelism and fault- tolerance. In this paper, we exploit the data-flow underpinning of Mentat to provide easy-to-use and transparent fault-tolerance. We present results on both a small-scale network and a wide-area heterogeneous environment that consists of three sites: the National Center for Super computing Applications, the University of Virginia and the NASA Langley Research Center.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1996
Accession Number
ADA466238

Entities

People

  • Andrew S. Grimshaw
  • Anh Nguyen-tuong
  • Mark Hyett

Organizations

  • University of Virginia

Tags

Communities of Interest

  • Autonomy
  • Biomedical
  • Energy and Power Technologies
  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Algorithms
  • Computational Complexity
  • Computations
  • Computer Programming
  • Computer Programs
  • Computer Science
  • Computers
  • Distributed Computing
  • Fail Safe
  • Fault Tolerance
  • Language
  • Models
  • Object Oriented Programming
  • Parallel Computing
  • Parallel Processing
  • Programming Languages
  • Software Development

Fields of Study

  • Computer science

Readers

  • Defense Technology Research and Development.
  • Parallel and Distributed Computing.