Semantic Information Pursuit for Multimodal Data Analysis
Abstract
This project will develop a mathematical framework for the characterization of information content in multimodal data. More specifically, given complex multimodal data (e.g., images, videos, documents, speech, network data) about a scene (e.g., a city, a shopping mall, a political rally) and a class of tasks (e.g., detection, navigation, exploration, search), this project will develop an information-theoretic framework for assessing what data modalities and representations are "most informative" for these tasks, algorithms for computing such "information representations" from data, and a Bayesian framework for integrating such information representations to produce interpretations of the scene. The proposed information-theoretic framework for characterizing information content in multimodal data combines principles from information physics with probabilistic models that capture rich semantic and contextual relationships between data modalities and tasks. These information measures will be used to develop novel statistical methods for deriving minimal sufficient representations of multimodal data that are invariant to some nuisance factors as well as novel domain adaptation techniques that mitigate the impact of data transformations on information content by finding optimal data transformations. The computation of such optimal representations and transformations for classification and perception tasks will require solving nonconvex optimization problems for which novel optimization algorithms with provable guarantees of convergence and global optimality will be developed. The uncertainty of such information representations derived from multimodal data will be characterized via novel statistical sampling methods that are broadly applicable to various representation learning problems. The information representations obtained from multiple modalities will be integrated by using a novel information theoretic approach to multi-modal data analysis called "information pursuit", which uses a Bayesian model of the scene to determine what evidence to acquire from multiple data modalities, scales and locations, and to coherently integrate this evidence. The proposed methods will be evaluated in various complex multimodal datasets, including text, images and video. It is expected that this project will enhance the capabilities of existing multimodal sensor based situational awareness algorithms and systems by allowing them to exploit contextual information across data streams, such as videos and documents, enabling effective fusion of information present in multimodal sensors to provide actionable intelligence.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Oct 30, 2018
- Source ID
- W911NF1710304
Entities
People
- Rene Vidal
Organizations
- Army Contracting Command
- Johns Hopkins University
- United States Army