Embedded Deep Learning and Advanced Computation
Abstract
In this project, several novel techniques were developed for accelerating deep learning computation on embedded devices with highly constrained computing resources. These techniques include: (1) using variable precision block floating point with stochastic rounding, (2) employing term quantization which quantizes floating point numbers into power-of-two terms, (3) extending pre-trained language models with domain-specific vocabulary, (4) minimizing memory access with schedules using constant bandwidth blocks, (5) applying full-stack optimization in the co-design of algorithms, models and architectures, (6) splitting neural networks for wearable computing, (7) designing algorithms for detecting input to DNNs which is out-of-distribution, (8) packing sparse DNNs for efficient systolic array implementations of DNNs, (9) designing memory-on-logic architectures and systolic building blocks for 3D-IC implementations of DNNs, and (10) leveraging bit-level sparsity in in-memory computing. These methods complement each other and are applicable to all resource-constrained deep learning accelerators.
Document Details
- Document Type
- Technical Report
- Publication Date
- Feb 01, 2023
- Accession Number
- AD1193214
Entities
People
- H. T. Kung
Organizations
- Harvard University