Floating-Point Photonic Accelerators - W911NF-17-S-0002-Topic Optoelectronics
Abstract
Thanks to the increase in computing power and the development of new training techniques, artificial neural networks (ANN), especially deep neural networks (DNN), have achieved higher degrees of model flexibility and orders-of-magnitude performance improvements. The success in modern speech recognition ushered in the current trend in DNN model training: replacing pre-training with large amounts of training data using straightforward backpropagation. Large amounts of data translate to high computational load and high cost in training. For example, to train the AlphaStar, which defeated the best human players in the real-time strategy PC game StarCraft2, over 60, 000 years of the game was played during training, which would require nearly $26 million based on the current hourly rates of GoogleĆs tensor processing unit (TPU). The same challenges in training permeates into many other applications ranging from basic science discovery to autonomous vehicles, to defense applications such as turbulence mitigation for directed-energy weapons. In comparison with the main-stream digital computing paradigms, analog implementation of parallel computing promises at least an order-of-magnitude improvement in computational efficiency. The emergence of DNNs rekindled the interest in analog system implementation of neural networks. Notable examples include array-based computing using nonvolatile memory and silicon photonic circuit, which promise 2 to 3 orders of magnitude computational efficiency improvement over current graphic processing units (GPUs). Despite the promising efficiency, analog computing lacks the superior dynamic range that digital computing offers. The dynamic range of analog signal is fundamentally limited by the errors (due to quantization) and noise, corresponding to an equivalent fixed-point precision representation. Though fixed-point representation has been deployed in inference, many DNNs trained using only fixed-point numbers showed inferior performances. Analog electric signal can achieve a floating-point-like precision by including pre-amplification. Yet computation involving multiple operations would require amplification for every operation, which is not scalable. An analog optical system that can perform parallel floating-point computation could fundamentally resolve the dynamic range disadvantage for analog neural networks. We propose to design and study the first, to the best of our knowledge, analog tensor accelerator capable of performing floating-point operations. This objective distinguishes itself from those of current efforts in electronic and photonic analog accelerators limited to fixed-point calculations. Exploiting the multi-dimensional nature of light, we map floating-point encoding and computation onto appropriate physical attributes of light. Such encoding in combination naturally produces multiply-accumulation of floating-point numbers after balanced coherent detection. In addition to demonstrating photonic floating-point tensor accelerators, we will also demonstrate their applications in neural network training, solving differential equations and iterative algorithms for image processing. A scalable and power-efficient floating-point analog tensor accelerator is a foundational and vertical technology applicable across the spectrum of applications in harnessing artificial intelligence for both commercial and defense applications.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Oct 07, 2021
- Source ID
- W911NF2110321
Entities
People
- Guifang Li
Organizations
- Army Contracting Command
- United States Army
- University of Central Florida