High-dimensional In-Network Analytics: Designs, Guarantees, and Trade-offs

Abstract

Unlocking the potential offered by the Internet of Battle Things poses several challenging to modern optimization and statistics. First, decisions and analytic on networked data must be performed at the edge of the network, which calls for the design of communication-constrained distributed optimization algorithms. Second, analytics result in so-called high-dimensional statistical models: large data sizes and parameter dimensions (generally exceeding the data size) make these problems ill-conditioned. Furthermore, in high-dimensional regimes, communicating the entire set of decision variables at each iteration can overshadow the other benefits afforded by distributed schemes. The goal is to exploit latent low-dimensional structures in the networkdata (e.g., sparsity, low-rank) to achieve statistical consistency in a resource-efficient manner. While statistical-computational guarantees of centralized high-dimensional learning are well studied, our understanding in the network setting is unsatisfactory, even for simple learning tasks (e.g., LASSO): (i) distributed schemes, designed and performing well in the low dimensional regime, can break down dramatically in the high-dimensional case; (ii) existing convergence studies fail to provide useful predictions; they are in fact confuted by experiments. It urges new analyses and (possibly) algorithmic designs blending computational thinking with statistical reasoning while revealing fundamental statistical-computation-communication tradeoffs and optimal network designs. Merging distributed optimization and statistical machine learning at more fundamental level, this project aim at filling this gap by proposing a novel distributed algorithmic design for highdimensional M-(nonconvex) estimation problems over meshed networks. The crux of the proposed approach is a novel statistically informed in-network decomposition that combines the use of local surrogates, aiming at estimating the centralized (unknown) loss, with in-network diffusion protocols aiming at estimating locally global information needed to achieve optimal statistical and optimization rates. This combination unlocks unprecedented guarantees: (i) when first-order approximation surrogates are employed, convergence at linear rates and minimax statistical errors are certified; (ii) faster (locally superlinear) rates are proved under higher order surrogates capturing statistical similarity of the data and without exchanging any Hessian matrixover the network! (iii) Tradeoffs in the statistical error, computational complexity, andcommunication costs, are characterized, revealing intriguing interplays between data size and optimization/network parameters; (iv) sample and network scalability in the typical highdimensional regimes is also studied, determining optimal scaling designs. To our knowledge, this is the first time that such a general statistical-optimization framework is proposed/studied for high-dimensional in-network learning problems.

Document Details

Document Type
DoD Grant Award
Publication Date
Aug 05, 2021
Source ID
N000142112673

Entities

People

  • Gesualdo Scutari

Organizations

  • Office of Naval Research
  • Purdue University
  • United States Navy

Tags

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Neural Network Machine Learning.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms