SubLiME (Sub-Linear Machine learning Environment): Hashing, Sketching, and Compressed Sensing over Parallel Hardware for Large-Scale Learning

Abstract

Over the last decade, information has been created at a rate that is exponentially faster than the growth rate of our storage and computational capabilities. The size and dimensionality of current datasets have made machine learning models significantly large and complex. Resource constraints restrict the capabilities of existing machine learning and data mining algorithms to process, analyze, store, and understand current big datasets.Recently, increased attention has been given to the study of randomized algorithms as a means to circumvent this widening gap between the explosion of data and our computing capabilities. Randomized algorithms based on sampling and random projections have been extensively used in the ML literature for reducing the dimensionality of datasets and for fast linear algebraic subroutines. In an ongoing ONR BRC project, we have found several fundamental connections between several important machine learning problems and fundamental randomized algorithmic tools, such as hashing, sketching, and compressed sensing. As a result of these connections, they have developed rigorous, exponentially cheaper alternatives for popular subroutines in machine learning ranging from feature selection to training deep learning models.This project aims to bridge the gap between randomized algorithm theory and machine learning practice. To convince practitioners to adopt novel randomized alternatives developed as a part of our ONR BRC project, we will go beyond theorem proving and developing generic prototypes. Through a program of large-scale validation, we will provide practitioners strong evidence of the benefits of randomized algorithms on their specific workloads. To fully utilize the power of modern computing hardware, we will tailor our algorithms to multi-core CPUs and GPUs so that they can take full advantage of the available parallelism.We are requesting a set of state-of-the-art GPU workstations to experiment with and validate new randomized algorithms for deep learning in a range of tasks, from extreme classification to video compressed sensing. The Lambda Hyperplane Max is currently the machine of choice for large-scale deep learning workloads. We also request a smaller GPU machine for code testing. This equipment will enable us to significantly expand the impacts of our ONR BRC project not just in practical directions but also in theory directions.

Document Details

Document Type
DoD Grant Award
Publication Date
Jun 17, 2020
Source ID
N000142012499

Entities

People

  • Anshumali Shrivastava

Organizations

  • Office of Naval Research
  • Rice University
  • United States Navy

Tags

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Neural Network Machine Learning.
  • Parallel and Distributed Computing.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks