Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices

Abstract

While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators.

Document Details

Document Type: Pub Defense Publication
Publication Date: Nov 22, 2021
Source ID: 10.3389/fnins.2021.749811

Entities

People

Brian D. Hoskins
Gina C. Adam
Junyun Zhao
Osama Yousuf
Siyuan Huang
Yutong Gao

Organizations

George Washington University
National Institute of Standards and Technology
Office of Naval Research

Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas