Reliable, Memory Speed Storage for Cluster Computing Frameworks

Abstract

Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks. While caching today improves read workloads, writes are either network or disk bound, as replication is used for fault-tolerance. Tachyon eliminates this bottleneck by pushing lineage, a well-known technique borrowed from application frameworks, into the storage layer. The key challenge in making a long-lived lineage-based storage system is timely data recovery in case of failures. Tachyon addresses this issue by introducing a checkpointing algorithm that guarantees bounded recovery cost and resource allocation strategies for recomputation under common resource schedulers. Our evaluation shows that Tachyon outperforms in-memory HDFS by 110x for writes. It also improves the end-to-end latency of a realistic workflow by 4x. Tachyon is open source and is deployed at multiple companies.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 16, 2014
Accession Number
ADA611854

Entities

People

  • Ali Ghodsi
  • Haoyuan Li
  • Ion Stoica
  • Matei Zaharia
  • Scott Shenker

Organizations

  • University of California, Berkeley

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Algorithms
  • Application Software
  • Big Data
  • Computations
  • Computer Programming
  • Computer Science
  • Computers
  • Data Sets
  • Electrical Engineering
  • Fault Tolerance
  • Models
  • Networks
  • Parallel Computing
  • Parallel Processing
  • Recovery
  • Test And Evaluation
  • Workload

Fields of Study

  • Computer science

Readers

  • International Journalism and Media Studies.
  • Parallel and Distributed Computing.
  • Systems Analysis and Design