Scalable Concurrency Control and Recovery for Shared Storage Arrays

Abstract

Shared storage arrays enable thousands of storage devices to be shared and directly accessed by end hosts over switched system area networks, promising databases and file systems highly scalable, reliable storage. In such systems, however; concurrent host I/Os can span multiple shared devices and access overlapping ranges potentially leading to inconsistencies for redundancy codes and for data read by end hosts. In order to enable existing applications to run unmodified and simplify the development of future ones, we desire a shared storage array to provide the illusion of a single controller without the scalability bottleneck and single point of failure of an actual single controller. In this paper; we show how rapidly increasing storage device intelligence coupled with storage's special characteristics can be successfully exploited to arrive at a high performance solution to this storage management problem. In particular; we examine four concurrency control schemes and specialize them to shared storage arrays; two centralized ones: simple server locking, and server locking with leased callbacks; and two distributed ones based on device participation: distributed locking using storage device embedded lock servers and timestamp ordering using loosely synchronized clocks. Simulation results show that both centralized locking schemes suffer from scalability limitations. Moreover; callback locking is particularly suspect if applications do not have much inherent locality and if the storage system introduces false sharing. Distributed concurrency control with device support is attractive as it scales control capacity with storage and performance capacity and offers the opportunity to piggyback lock/ordering messages on operation requests, eliminating message latency costs. Simulations show that both storage optimized device based protocols exhibit close to ideal scaling achieving 90-95% of the throughput possible under totally unprotected operation.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 1999
Accession Number
ADA363551

Entities

People

  • Garth Gibson
  • Khalil Amiri
  • Richard Golding

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Energy and Power Technologies
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Acquisition
  • Clocks
  • Computations
  • Computer Programming
  • Computer Science
  • Computers
  • Computing System Architectures
  • Consistency
  • Databases
  • Device Drivers
  • Electronic Mail
  • Operating Systems
  • Probability
  • Reliability
  • Servers (Computer Hardware)
  • Simulations
  • Transport Protocols

Fields of Study

  • Computer science
  • Engineering

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Parallel and Distributed Computing.