Simba

Abstract

Package-level integration using multi-chip-modules (MCMs) is a promising approach for building large-scale systems. Compared to a large monolithic die, an MCM combines many smaller chiplets into a larger system, substantially reducing fabrication and design costs. Current MCMs typically only contain a handful of coarse-grained large chiplets due to the high area, performance, and energy overheads associated with inter-chiplet communication. This work investigates and quantifies the costs and benefits of using MCMs with finegrained chiplets for deep learning inference, an application domain with large compute and on-chip storage requirements. To evaluate the approach, we architected, implemented, fabricated, and tested Simba, a 36-chiplet prototype MCM system for deep-learning inference. Each chiplet achieves 4 TOPS peak performance, and the 36-chiplet MCM package achieves up to 128 TOPS and up to 6.1 TOPS/W. The MCM is configurable to support a flexible mapping of DNN layers to the distributed compute and storage units. To mitigate inter-chiplet communication overheads, we introduce three tiling optimizations that improve data locality. These optimizations achieve up to 16% speedup compared to the baseline layer mapping. Our evaluation shows that Simba can process 1988 images/s running ResNet-50 with a batch size of one, delivering an inference latency of 0.50 ms.

Document Details

Document Type: Pub Defense Publication
Publication Date: May 24, 2021
Source ID: 10.1145/3460227

Entities

People

Alicia Klinefelter
Ben Keller
Bill Dally
Brian Zimmer
Brucek Khailany
C. Thomas Gray
Jason Cemons
Joel Emer
Matthew Fojtik
Nan Jiang
Nathaniel Pinckney
Priyanka Raina
Rangharajan Venkatesan
Stephen G. Tell
Stephen W. Keckler
Yakun Sophia Shao
Yanqing Zhang

Organizations

Defense Advanced Research Projects Agency
Massachusetts Institute of Technology
Nvidia
Stanford University
University of California, Berkeley

Simba

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas