Dimensionality Reduction and Randomized Linear Algebra: Foundations and New Directions

Abstract

In 1996 Alon, Matias, and Szegedy proposed the first linear sketching algorithms for processing massive datasets, specifically for `p norm estimation of a streamed vector. The idea is for a high-dimensional x 2 Rn receives linearupdates, maintain Sx in memory for some S 2 Rm??n,requiring m ? n memory, which allows querying x given only access to Sx. Since their work, linear sketching has played a role broadly in many areas, including streaming, graphalgorithms, distributed computing, compressed sensing, and large-scale linear algebra computations (see e.g. the book by PI Woodruff) and machine learning in hundreds of published works. Applied to large-scale matrix problems,the sketching paradigm is roughly as follows: one is given say a large matrix A and would like to quickly find a matrix S with many fewer rows so that algorithms wishing to query information about A (e.g., low rank approximations, solutionto regression problems, etc.) can (approximately) do so given access only to SA (the ???sketch-and-solve??? paradigm), or can do fast computation on SA to find good preconditioners for solving problems in A. Since SA is a much smallermatrix than A, algorithms operating on it are then typically more memory and time efficient. Constructions of such S generally fall under two categories: oblivious constructions which draw S from a distribution independent of the input A itself (the so-called ???random projection method???), and row selection methods (so that each row of S has a 1 in exactly one location and zeroes elsewhere) constructed by either algorithmically finding a subset of rows of A that well-enough approximates it for some application, or finding those rows via sampling from a distribution that depends on A itself (e.g., leverage score sampling). Depending on the application, such S have been developed implementing a variety of primitives, including approximate matrix multiplication and the so-called ???subspace embedding??? primitive in the `2 and other norms.Such sketching methods have thus far been successfully applied to a number of randomized linear algebra applications, including linear and nonlinear regression, low-rank approximation in the `2 and other `p norms and theentry-weighted case, k-means clustering and other constrained low-rank approximation problems, CUR decomposition, streaming spectral sparsifiers, low-rank tensor regression, robust subspace approximation and robust regression , and many more, several of which were developed by one or both of the PIs. This proposal aims to both further expand the range of applications of sketching methods in large-scale linear algebra, as well as develop improvements to the core primitives used across these applications, such as subspace embeddings. Specifically, the PIs plan to make significantcontributions in the following two related areas: ? New applications of sketching to linear algebra. The PIs plan to expand the range of sketching methods to new problems, including low-rank approximation under various error measures (e.g., Huber and `0 error), sketching-based preconditioning for `p regression, and various regression problems such as Tukey regression. Several of these problems are NP-hard, and the PIs are looking at ways of coping with this through bicriteria algorithms, parameterized complexity, and natural distributional assumptions. ? Improved constructions of subspace embeddings. The PIs plan to investigate fundamentally new ways to construct subspace embeddings which leverage non-obliviousness to construct embeddings into fewer rows more quickly than currently known, as well as study related questions such as vector dimensionality reduction and metric sketching.

Document Details

Document Type
DoD Grant Award
Publication Date
Jul 26, 2018
Source ID
N000141812562

Entities

People

  • David Woodruff

Organizations

  • Massachusetts Institute of Technology
  • Office of Naval Research
  • United States Navy

Tags

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Linear Algebra
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms