Semantic Information Pursuit for Multimodal Data Analysis

Abstract

This project will develop a mathematical framework for the characterization of information content in multimodal data. More specifically, given complex multimodal data (e.g., images, videos, documents, speech, network data) about a scene (e.g., a city, a shopping mall, a political rally) and a class of tasks (e.g., detection, navigation, exploration, search), this project will develop an information-theoretic framework for assessing what data modalities and representations are "most informative" for these tasks, algorithms for computing such "information representations" from data, and a Bayesian framework for integrating such information representations to produce interpretations of the scene. The proposed information-theoretic framework for characterizing information content in multimodal data combines principles from information physics with probabilistic models that capture rich semantic and contextual relationships between data modalities and tasks. These information measures will be used to develop novel statistical methods for deriving minimal sufficient representations of multimodal data that are invariant to some nuisance factors as well as novel domain adaptation techniques that mitigate the impact of data transformations on information content by finding optimal data transformations. The computation of such optimal representations and transformations for classification and perception tasks will require solving nonconvex optimization problems for which novel optimization algorithms with provable guarantees of convergence and global optimality will be developed. The uncertainty of such information representations derived from multimodal data will be characterized via novel statistical sampling methods that are broadly applicable to various representation learning problems. The information representations obtained from multiple modalities will be integrated by using a novel information theoretic approach to multi-modal data analysis called "information pursuit", which uses a Bayesian model of the scene to determine what evidence to acquire from multiple data modalities, scales and locations, and to coherently integrate this evidence. The proposed methods will be evaluated in various complex multimodal datasets, including text, images and video. It is expected that this project will enhance the capabilities of existing multimodal sensor based situational awareness algorithms and systems by allowing them to exploit contextual information across data streams, such as videos and documents, enabling effective fusion of information present in multimodal sensors to provide actionable intelligence.

Document Details

Document Type
DoD Grant Award
Publication Date
Oct 30, 2018
Source ID
W911NF1710304

Entities

People

  • Rene Vidal

Organizations

  • Army Contracting Command
  • Johns Hopkins University
  • United States Army

Tags

Fields of Study

  • Computer science

Readers

  • Computer Vision.
  • Distributed Systems and Data Platform Development
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms