Enabling Low Latency AI/ML Workloads via Wafer Scale Acceleration
Abstract
Artificial Intelligence/Machine Learning (AI/ML) is playing a central role in pushing the frontiers of technology and transforming our society. Over the last decade, there has been significant progress in several domains due to AI/ML, including computer vision, machine translation, social networks, robotics, and games. However, to obtain high accuracy, the models have been increasing both in size Ð number of model parameters, and complexity Ð sparse tensor based irregular computations induced due to techniques such as neighborhood aggregation, dropout, attention mechanisms, etc. Larger sizes of the models have led to not only an increase in the training times of the AI/ML models to the order of several weeks, but also a significant increase in the latencies of individual queries, when the models are deployed in real-world applications. Similarly, sparse tensor operations have significantly reduced the effective utilization of traditional AI accelerators, severely slowing the advancements in the capabilities of AI/ML research due to low ROI. To push the frontiers of AI/ML research by employing irregular, sparse networks deployed in real-world latency critical scenarios, we plan to acquire Cerebras CS-2 AI/ML accelerator. The CS-2 is the industryÕs fastest AI accelerator. It reduces training times from months to minutes, and inference latencies from milliseconds to microseconds. CS-2 is especially suited for emerging AI/ML models which require fast processing of irregular computations using their Sparse Linear Algebra Compute (SLAC) cores. The Cerebras software platform integrates with popular machine learning frameworks like TensorFlow and PyTorch, so researchers can use familiar tools and rapidly bring their models to the CS-2. Researchers do not need extensive knowledge of parallel programming techniques or experience in configuring complicated multi-node setups. CS-2 abstracts away the complexities of highly parallel execution, allowing researchers to focus on deep learning and/or HPC rather than on systems engineering problems. The proposed equipment will be used by several researchers at USC, USC/ISI and USC/ICT. Example U.S. Army funded projects that can benefit from this equipment grant include: (a) Graph Theoretic Methods for Cybersecurity in CPS, (b) Scalable Accelerated Deep Graph Learning for the Future Battlespace, (c) Strategy Optimization in Deep Multi-Agent Reinforcement Learning for Military Training Simulations, (d) Leveraging Neural Combinatorial Optimization Heuristics in Deep Reinforcement Learning, (e) Learning Transferable Hierarchical Policies in Multi-Agent Reinforcement Learning, (f) Deep Learning Based Priors in Quantifying Uncertainty, (g) Closed-loop Multisensory Brain-Computer Interface for Enhanced Decision Accuracy, and (h) Adaptive Joint Cognitive Systems for Complex and Strategic Decision Making: Building Trust in Human-machine Teams Through Brain-Computer-Interface Augmentation, Social Interaction and Mutual Learning. We expect CS-2 to reduce the latency of emerging AI/ML models by at least an order of magnitude. This will enable new research capabilities in a number of application domains that rely on processing sparse, irregular neural networksss models. U.S. Army and other DoD sponsored projects typically require techniques that either employ AI/ML models in the loop of physics based models Ð a learning paradigm known as ÒInformed ML in the LoopÓ; or require processing large amount of streaming data for processing in real-time. By achieving extreme low latencies on these workloads, CS-2 will enable plethora of new applications in Cybersecurity, Internet-of-Battlefield-Things (IoBT), Enhanced Situational Awareness in Battlefields, and Brain-Computer Interface to name a few.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jul 28, 2023
- Source ID
- W911NF2310237
Entities
People
- Viktor K. Prasanna
Organizations
- Army Contracting Command
- United States Army
- University of Southern California