Algorithmic Tractability and Computational Limits in High-Dimensional Linear Regression
Abstract
The modern day ability to collect an unprecedented amount of data, led to fundamentally new opportunities in the areaof statistical"" inference. The blessing of Big Data, however, comes at the cost of Curse of Dimensionality, as manyinference problems are based on" models exhibiting very high dimensions. The explosion of dimensionality severelylimits the computational tools available for conducting inference. Thus the contemporary inference problems requirethe design of scalable computational tools capable of dealing with high dimensional models. While the majority of theexisting literature is devoted to designing fast algorithms which are capable of" dealing with inference problemsinvolving high dimensions, and in particular devoted to identifying the assumptionsadmitting the e""xistence of scalable algorithms, the goal of the present proposal is to address not only the tractabilitybut primarily the fundamen"tal limitations for designing such fast computational tools.One of the main obstacles to be addressed in this proposal is the fact" that the primary existing framework for dealingwith computational hardness issues, namely the algorithmic complexity classes such"" as NP and #P, are poorly suitedfor the problems of high dimensional statistics since these complexity classes deal with worst case"" instances, namelythe challenge of designing fast (polynomial time) algorithms capable of solving any instance of the problem withi""n acertain class of problems. The problems in high dimensional statistics, however usually involve random inputs arisingfrom data" following a particular probability distribution. As such the main algorithmic challenge is designing fastalgorithms for typical (n"amely generated at random), rather that the worst-case instances.The goal of the present work is to introduce two novel ways of ana"lysis of algorithmic tractability as well as limits arisingin high dimensional statistics for models involving random inputs. We will consider the two approaches in the context ofhigh-dimensional regression problem with sparsity. It is described as a problem of" inferring the vector of regressioncoefficients corresponding to the model Y = X + W, given the matrix of independent variables X" and the vector ofdependent variables Y . The dimension of the vector (i.e. the number of independent variables) is assumed toex"ceed significantly the dimension of Y , which is the sample size, but the vector will be assumed to be sparse,meaning that only a h"andful of coefficients of differ from zero. Our first proposed method will be the one inspired byrecent developments in the field o"f statistical physics, specifically the theory of spin glasses and its ramification for thefield of randomly generated constraint s""atisfaction problems. In particular, the proposed approach will be based onstudying an intricate geometry ofsolutions space approp"riately defined. The second approach proposed within the scope of this project will based on average case hardness of some computati"onal problems arising in the field of cryptography, in particular the problemof finding the shortest vector in a lattice generated"" by a random collection of vectors, and the related so-called Subset Sum problem involving random weights.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Sep 01, 2017
- Source ID
- N000141712790
Entities
People
- David Gamarnik
Organizations
- Massachusetts Institute of Technology
- Office of Naval Research
- United States Navy