Using Peer Support to Reduce Fault-Tolerant Overhead in Distributed Shared Memories

Abstract

We present a peer logging system for reducing peformance overhead in fault tolerant distributed shared memory systems. Our system provides fault tolerant shared memory using individual checkpointing and rollback, Peer logging logs DSM modification messages to remote nodes instead of to local disks. We present results for implementations of our fault tolerant technique using simulations of both TreadMarks, a software only DSM, and Cashmere, a DSM using memory mapped hardware. We compare simulations with no fault tolerance to simulations with local disk logging and peer logging. We present results showing that fault tolerant Treadmarks can be achieved with an average of 17 percent overhead for peer logging. We also present results showing that while almost any DSM protocol can be made fault tolerant, systems with localized DSM page meta-data have much lower overheads.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 1996
Accession Number
ADA329899

Entities

People

  • G. C. Hunt
  • M. L. Scott

Organizations

  • University of Rochester

Tags

DTIC Thesaurus Topics

  • Bandwidth
  • Computations
  • Computer Science
  • Computers
  • Computing System Architectures
  • Consistency
  • Differential Equations
  • Directories
  • Equations Of Motion
  • Ethernet
  • Experimental Data
  • Fault Tolerance
  • Magnetic Fields
  • Networks
  • Parallel Computing
  • Parallel Processing
  • Simulations

Fields of Study

  • Computer science

Readers

  • Parallel and Distributed Computing.