Simba

Abstract

Package-level integration using multi-chip-modules (MCMs) is a promising approach for building large-scale systems. Compared to a large monolithic die, an MCM combines many smaller chiplets into a larger system, substantially reducing fabrication and design costs. Current MCMs typically only contain a handful of coarse-grained large chiplets due to the high area, performance, and energy overheads associated with inter-chiplet communication. This work investigates and quantifies the costs and benefits of using MCMs with finegrained chiplets for deep learning inference, an application domain with large compute and on-chip storage requirements. To evaluate the approach, we architected, implemented, fabricated, and tested Simba, a 36-chiplet prototype MCM system for deep-learning inference. Each chiplet achieves 4 TOPS peak performance, and the 36-chiplet MCM package achieves up to 128 TOPS and up to 6.1 TOPS/W. The MCM is configurable to support a flexible mapping of DNN layers to the distributed compute and storage units. To mitigate inter-chiplet communication overheads, we introduce three tiling optimizations that improve data locality. These optimizations achieve up to 16% speedup compared to the baseline layer mapping. Our evaluation shows that Simba can process 1988 images/s running ResNet-50 with a batch size of one, delivering an inference latency of 0.50 ms.

Document Details

Document Type
Pub Defense Publication
Publication Date
May 24, 2021
Source ID
10.1145/3460227

Entities

People

  • Alicia Klinefelter
  • Ben Keller
  • Bill Dally
  • Brian Zimmer
  • Brucek Khailany
  • C. Thomas Gray
  • Jason Cemons
  • Joel Emer
  • Matthew Fojtik
  • Nan Jiang
  • Nathaniel Pinckney
  • Priyanka Raina
  • Rangharajan Venkatesan
  • Stephen G. Tell
  • Stephen W. Keckler
  • Yakun Sophia Shao
  • Yanqing Zhang

Organizations

  • Defense Advanced Research Projects Agency
  • Massachusetts Institute of Technology
  • Nvidia
  • Stanford University
  • University of California, Berkeley

Tags

Fields of Study

  • Computer science

Readers

  • Integrated Circuit Design and Technology.
  • Neural Network Machine Learning.
  • Parallel and Distributed Computing.

Technology Areas

  • AI & ML
  • AI & ML - Neural Networks