Dynamic Scene Graphs for Extracting Activity-based Intelligence
Abstract
The primary objective of this project is to explore a dynamic learning framework that effectively extracts ABI in various highly complicated military operations. This project plans to develop Dynamic Scene Graphs over large-scale multimodal time series data for representation learning. The new representations will enable learning from a complex and dynamic environment, where a variety of vision tasks can be achieved including open-set object classification and detection, event detection, question-answering, and dense captioning. The proposed framework will achieve the following four goals: (i) Combining the strength of multimodal data for learning effective representations. We propose to develop a novel multimodal dynamic graph to learn synergic and exclusive information from the multimodal data captured from multiple data sources.Ê(ii) Enabling open-set object classification and detection in order to understand novel objects in open-world scenarios. We propose to develop open-set object classification and detection approaches by learning the data distribution boundaries of known and unknown classes. (iii) Effective event detection and model adaptation to novel events from limited training samples. We propose a top-k Ranking Loss-based Multi-Instance Learning approach for event detection and a few-shot learning algorithm that measures model uncertainty during model adaptation to avoid overconfident wrong predictions. (iv) Interactive learning with human users via active question-answering and semantic description generation. We will create a novel framework that allows users to communicate with machines and integrate human knowledge into the machines. All of these objectives enable the extraction of ABI to support military operations. The proposed project will advance machine learning and computer vision in the following key areas: multimodal data fusion, open-set object classification and detection, event detection, few-shot learning, visual question-answering, and dense captioning. These technical innovations will be elegantly integrated into the proposed framework for ABI extraction. More specifically, the framework extends the state-of-the-art machine learning and computer vision research in four main directions: 1) a novel approach for learning synergic features from multimodal data in a collaborative and positive manner; 2) learning data distribution and its reciprocal distribution for open-set object classification and detection; 3) developing a novel min-max optimization problem for event detection and an uncertainty-based model adaptation learning approach for few-shot learning from limited samples; and 4) Graph-based active question-answering learning for updating object and event models, and a novel query-on-path inference algorithm for generating semantic descriptions. The proposed research closely aligns with the defined objectives of the Army Research Office, specifically in the categories of ``Foundations of Image and Multimodal Data Analysis . The outcomes of this research will have the capacity to have a direct positive effect on actionable intelligence from diverse multimodal data both inside and outside of the army domain. Our work may also support missions ranging from humanitarian to kinetic operations. The proposed research is potentially transformative and broadly applicable to many other science and engineering domains (e.g., finance, business, medicine, and biology), where large volumes of complex data are collected in real-time, and a systematic decision support system is needed to leverage machine intelligence while allowing humans to stay in the loop to achieve high-quality decisions through continuous human-machine collaboration.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Jun 25, 2021
- Source ID
- W911NF2110236
Entities
People
- Yu Kong
Organizations
- Army Contracting Command
- Rochester Institute of Technology
- United States Army