Modeling and Simulation of Extreme-Scale Fat-Tree Networks for HPC Systems and Data Centers

Abstract

As parallel and distributed systems are evolving toward extreme scale, for example, high-performance computing systems involve millions of cores and billion-way parallelism, and high-capacity storage systems require efficient access to petabyte or exabyte of data, many new challenges are posed on designing and deploying next-generation interconnection communication networks in these systems. Fat-tree networks have been widely used in both data centers and high-performance computing (HPC) systems in the past decades and are promising candidates of the next-generation extreme-scale networks. In this article, we present FatTreeSim, a simulation framework that supports modeling and simulation of extreme-scale fat-tree networks with the goal of understanding the design constraints of next-generation HPC and distributed systems and aiding the design and performance optimization of the applications running on these systems. We have systematically experimented FatTreeSim on Emulab and Blue Gene/Q and analyzed the scalability and fidelity of FatTreeSim with various network configurations. On the Blue Gene/Q Mira, FatTreeSim can achieve a peak performance of 305 million events per second using 16,384 cores. Finally, we have applied FatTreeSim to simulate several large-scale Hadoop YARN applications to demonstrate its usability.

Document Details

Document Type
Pub Defense Publication
Publication Date
Apr 30, 2017
Source ID
10.1145/2988231

Entities

People

  • Adnan Haider
  • Dong Jin
  • Ning Liu
  • Xian-he Sun

Organizations

  • Air Force Office of Scientific Research
  • Cleversafe
  • Illinois Institute of Technology
  • Office of Science

Tags

Fields of Study

  • Computer science

Readers

  • Computer Networking
  • Distributed Systems and Data Platform Development
  • Parallel and Distributed Computing.