Partial Differential Equations, Nonconvex Optimization and Deep Neural Nets
Abstract
We propose to develop links between partial differential equations (PDEs) and deep neural nets (DNNs) with the aim of developing new deep learning techniques which lead to conceptually simple, practical, very accurate and theoretically provable algorithms. This builds on work we have done earlier in [COO+17], where we used ideas from Hamilton-Jacobi (HJ) equations and control and differential games to improve training time, modify and improve the training algorithm. In [WLL+18, ZYB+18] we introduced new interpolation to fill in missing scientific and image data. In [PYR+18] we developed techniques in nonconvex optimization algorithms based on HJ to obtain state-of-the-art phaseretrieval algorithms. We extended this idea to greatly improve generalization accuracy of DNNs, [WLL+18]. This was done by unifying deep learning and kernel methods with PDE based control problems. These new optimizationtechniques also led to very efficient, accurate and fast weight quantization of DNNs, [YZL+18] This was again related to HJ initial value problems and led usto outperform the state-of-the-art Binary Connect [CBD+15]. We generalized our ideas in [COO+17] to solve different HJ equations via the Lax formula, which led us to generalized proximal (G-prox) algorithms. These simple alternativesto stochastic gradient descent so far seem to give superior results in generalization accuracy. We have found the equivalence of deep ResNets with PDE based control problems. We used the labeled data to obtain an accurateinterpolation and trained the net via data dependent implicit activation functions. This, remarkably, boosts the accuracy roughly 20-30 percent relative tostandard DNN for Cifar 10 and Cifar 100 datasets and substantially reduces the lack of training issue. We expect to improve our algorithm with an even better interpolation scheme, a better back propagation method and by the use ofDenseNet, which can be interpreted in our framework as a multistep ODE method. Our applications will include: transfer learning, improved face recognition, 3D shape recognition and spatio-temporal data modeling. We willgeneralize our low-bit weight network work by investigating larger model capacity at wider bit width such as 4-bit networks. Additionally we will develop quantized activation. We will also apply our quantization technique to acceleratethe inference and enable fast object detection with CPU only. For our G-prox work, we propose to implant this into popular learning engines and incorporate other existing acceleration techniques, including adaptive learning, Nesterovmomentum and others. Also more general improvements of the proximal method, based on HJ will be examined to see if they can be as successful as they were for our phase retrieval work in [PYR+18].
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jul 26, 2018
- Source ID
- N000141812527
Entities
People
- Stanley Osher
Organizations
- Office of Naval Research
- United States Navy
- University of California, Los Angeles