Fast, Distributed Algorithms in Deep Networks

Abstract

In this project we demonstrate two different approaches to speed up the training of neural nets. First, even before training, we demonstrate an informed way of initializing parameters closer to their final, trained values. Second, we introduce a new training algorithm that scales linearly when parallelized, allowing for substantially decreased training times on large datasets. Neural nets are famously unintuitive, and as such, parameters are typically randomly assigned, then adjusted during training. However, by using a cosine activation function, a layer of neurons can be made to approximate the implicit feature space of a kernel. Therefore, intuition on kernel selection can guide initial parameter assignments even before any data observations. We implement this approach and show that it can greatly speed uptraining, often approaching the final accuracy after only one training iteration. Our second contribution was in the application of the ADMM algorithm to neural nets. Conventional gradient based optimization methods for neural nets scale poorly which is difficult to avoid with extremely large datasets. The proposed method avoids many of the conditions that typically make gradient based methods slow, allowing for efficient computation without specialized hardware. Our implementation demonstrates strong scalability with linear speedups even up to thousands of cores. We show that for large problems, our approach can converge faster than GPU-based implementations of standard algorithms.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: May 11, 2016
Accession Number: AD1013468

Entities

People

Ryan J. Burmeister

Organizations

United States Naval Academy

Fast, Distributed Algorithms in Deep Networks

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas