High Performance Computing Multicast

Abstract

This investigation of High Performance Computing (HPC) Multicast for High-Speed Publication-Subscription (Pub-Sub) sought to deliver both insight into and implementation of high-performance multicast solutions that enable better utilization of cloud resources. These solutions combine improved scalability with increased consistency ensuring that expected and necessary system conditions are thus met for a myriad of critical national-asset applications that are likely to move to the cloud in the next decade. In the context of this effort, the applicability of the oft-invoked Consistency, Availability and Partition tolerance (CAP) theorem was explored within specific environments of commonly deployed clouds, and novel insights into CAP's tradeoffs were developed between CAP and its conclusion that a replicated service can possess just two of the three. It was determined that there are replicated services for which the applicability of CAP is unclear specifically, the scalable soft-state services that run in the first-tier of a single cloud-computing data center. The challenge is that such services live in a single data center and run on redundant networks. Partitioning events involve single machines or small groups, and are treated as node failures; thus, the CAP proof doesn't apply in a formal sense, as it s proven by forcing a replicated service to respond to conflicting requests during a partitioning failure, triggering inconsistency. Nonetheless, most developers believe in a generalized CAP folk theorem, holding that scalability and elasticity are incompatible with strong forms of consistency. We designed, implemented, and benchmarked the Isis2 platform: a first-tier consistency alternative that replicates data, combines agreement on update ordering with amnesia freedom, and supports both good scalability and fast response. A team of students was lead in the application of Isis2 to build a large-scale distributed computer-vision landmark-recognition system,

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 2012
Accession Number
ADA557017

Entities

People

  • Daniel Freedman
  • Hakim Weatherspoon
  • Ken Birman
  • Robert Van Renesse
  • Tudor Marian

Organizations

  • Cornell University

Tags

Communities of Interest

  • C4I
  • Energy and Power Technologies
  • Human Systems

DTIC Thesaurus Topics

  • Air Force
  • Air Force Research Laboratories
  • Cloud Computing
  • Cloud Storage
  • Command And Control
  • Computer Networks
  • Computer Programming
  • Computers
  • Data Centers
  • Databases
  • Failure Mode And Effect Analysis
  • High Performance Computing
  • Infrastructure
  • Intellectual Property
  • Network Computing
  • Network Protocols
  • Operating Systems

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Computer Networking
  • Distributed Systems and Data Platform Development

Technology Areas

  • AI & ML