Deploying Machine Learning in Low Power, Portable, and Low Latency Systems
Abstract
Background: Machine learning and AI systems, in particular deep neural networks (NNs), have recently enabled computers to compete with or surpass human performance on many computer vision, natural language processing, and decision making tasks. Most machine learning tools have been developed for powerful servers or graphical processing units (GPUs), for both training and implementation. These platforms suffer from a number of problems that make them impractical for real-world systems: (i) They are large and heavy, making them unfit for embedded/portable applications. (ii) They are power hungry and inefficient, making them unusable for battery/solar powered systems. (iii) They have high latency, making then unfit for real-time decision making. The Problem: As machine learning moves out of the lab and into real-world devices, industry experts need embedded hardware such as field-programmable gata arrays (FPGAs) and application-specific integrated circuits (ASICs). These hardware platforms are the workhorse behind portable computing systems because they are small, cheap (vs. clusters of CPUs/GPUs), low power (100X more energy efficient than CPUs/GPUs), re-programmable (in the case of FPGAs), and have nanosecond latency. Unfortunately, industry standard machine learning toolkits (TensorFlow and PyTorch) rely on high-level programming languages (Python). These toolkits enable rapid model development, but are computationally inefficient and do not run on embedded hardware systems (which require hardware description languages like VHDL and Verilog). The Goal: Develop a toolkit that simultaneously enables both 1) rapid development of machine learning systems using industry standard toolkits, and 2) efficient deployment on portable systems. To do this, we will create a software stack that takes machine learning models developed/specified using TensorFlow or Pytorch, and compiles these models down into the VHDL and Verilog hardware description languages. Engineers can then immediately load/deploy models onto FPGAs, or print custom ASICs. Furthermore, our software stack will compress/quantize models so they can be executed using simple bit-wise operations that are more efficient than conventional floating-point implementations. By leveraging embedded computing platforms and model compression, our software stack will offer machine learning systems with roughly 1000X the energy efficiency and responsiveness of a GPU.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Feb 14, 2019
- Source ID
- W911NF1810095
Entities
People
- Thomas Goldstein
Organizations
- Army Contracting Command
- Office of the Secretary of Defense
- University of Maryland