Recoverable Distributed Shared Memory Under Sequential and Relaxed Consistency.

Abstract

Distributed shared memory (DSM) implemented on a cluster of workstations is an increasingly attractive platform for executing parallel scientific applications. Checkpointing and rollback techniques can be used in such a system to allow the computation to progress in spite of the temporary failure of one or more processing nodes. The complexity and overhead inherent in traditional message-passing checkpointing techniques can be reduced by taking advantages of specific properties of DSM. In this paper we show that, if designed correctly, a DSM system only needs to consider a subset of message-passing dependencies for correct rollback. A passive server model of DSM computation is described that allows a loosening of dependency restrictions by considering the events involved in interactions between nodes as atomic. An ownership timestamp scheme is used to eliminate many of the dependencies related to keeping directories consistent. The schemes can be implemented in DSM hardware by simply redesigning the directory at the network interface. Finally, we show that by relaxing the memory consistency model and using lazy release consistency, it is possible to further relax dependency restrictions. (AN)

Open PDF

Document Details

Document Type
Technical Report
Publication Date
May 01, 1995
Accession Number
ADA293966

Entities

People

  • Bob Janssens
  • W. Kent Fuchs

Organizations

  • University of Illinois Urbana–Champaign

Tags

Communities of Interest

  • Space

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Application Software
  • Availability
  • Classification
  • Computations
  • Consistency
  • Data Transmission
  • Directories
  • Frequency
  • Guarantees
  • High Performance Computing
  • High Reliability
  • Illinois
  • Military Research
  • Multiprocessors
  • Simulators

Fields of Study

  • Computer science
  • Engineering

Readers

  • Parallel and Distributed Computing.