Efficient Byzantine Fault Tolerance for Scalable Storage and Services

Abstract

Distributed systems experience and should tolerate faults beyond simple component crashes as such systems grow in size and importance. Unfortunately, tolerating arbitrary faults, also known as Byzantine faults, poses several challenges to system designers, often limiting performance, requiring additional hardware, or both. This dissertation presents new protocols that provide substantially better performance than previously demonstrated. The Byzantine fault-tolerant erasure-coded block storage protocol proposed in this thesis provides 40% higher write throughput than the best prior approach. The Byzantine fault-tolerant replicated state machine provides a factor of 2.2-2.9 times higher throughput than the best prior approach. Furthermore, the protocols presented in this dissertation require 25-33% fewer responsive servers than the nearest competitors. To enable these results, this dissertation introduces several new techniques, including homomorphic fingerprinting, partial encoding, and Byzantine Locking, that provide unprecedented scalability, higher throughput, lower latency, and lower computational overhead. This dissertation also considers new methods for analyzing the correctness of distributed systems in the presence of faulty clients. Distributed services and storage systems built using these techniques can provide Byzantine fault tolerance in a more efficient, higher performance, and more scalable manner than previously thought possible.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 2009
Accession Number
ADA515890

Entities

People

  • Gregory R. Ganger
  • James Hendricks
  • Michael Reiter
  • Miguel Castro
  • Priya Narasimhan

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Ground and Sea Platforms

DTIC Thesaurus Topics

  • Algorithms
  • Authentication
  • Coding
  • Computations
  • Computer Programming
  • Computer Science
  • Computers
  • Data Storage Systems
  • Decoding
  • Failure Mode And Effect Analysis
  • Fault Tolerance
  • Law
  • Measurement
  • Operating Systems
  • Reliability
  • Test And Evaluation
  • Workload

Fields of Study

  • Computer science

Readers

  • Computer Networking
  • Database Systems and Applications
  • Mathematics or Statistics