Scalable Inference on High-Dimensional Data via Task- and Domain-Specific Embeddings (Research Area 5.2)

Abstract

This project aims to establish a framework for performing inference on large image collections using dimensionality reduction techniques. The objective is to design low-dimensional embedding that are capable of enabling inference with little or no loss on performance, robust to nuisance parameters, and scalable to high-dimensional datasets with millions of data points. The goal is to enable a framework of linear and nonlinear dimensional reduction techniques that can enable us to perform detection, track, and classify objects. To achieve this, we focus on three specific problems that occur in the context of large image collections. First, while carefully-curated collections of images often have simple underlying low-dimensional geometric structures, the same is not true for images collections acquired in unsupervised settings. Image collections obtained from the internet repositories are often riddled with undesirable variabilities that corrupts low-dimensional geometric structures. The first objective of this project is to enable a flexible class of image representations that is inherently robust to nuisance variables, and endowed with a notion of distance that is proportional to transformations of interest. This will be achieved by enabling a theory of manifold learning using local features as the underlying representations and building transport operators that provide a meaningful notion of distance. Second, given representations and distances that robust capture intrinsic geometrical structure of a dataset, it is necessary to preserve this structure while enabling dimensionality reduction. Specifically, we will create a novel class of dimensionality reduction techniques that allows to manipulate distances between data points in a task- and class-specific manner. One example is to design a dimensionality reduction map that collapses points from the same class while maintaining a minimum separation between points from opposite classes. We will design a range of interesting and useful dimensionality reduction techniques by exploiting this concept of engineering pairwise distances. The techniques developed in this project will be evaluated using three specific applications: (i) approximate nearest neighbors for efficient and fast retrieval of images belonging to a large collection, (ii) organization of massive image datasets for visualization and inference, and (iii) comprehensive inference to solve inference problem directly from dimensionality-reduced data.

Document Details

Document Type
DoD Grant Award
Publication Date
Jan 12, 2017
Source ID
W911NF1610441

Entities

People

  • Aswin C. Sankaranarayanan

Organizations

  • Army Contracting Command
  • Massachusetts Institute of Technology
  • United States Army

Tags

Fields of Study

  • Computer science

Readers

  • Computer Vision.
  • Distributed Systems and Data Platform Development
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms