Semantic Representations for Multi-Viewpoint Multimodal Geolocation
Abstract
As NGA and other government agencies collect more data, it becomes increasingly difficult for analysts to keep up. Requiring analysts to reason across multiple modalities of data and/or viewpoints adds a lot of complexity to this task, resulting in either significantly slower analysis or analysts skipping data entirely. The proposed work seeks to automate the aggregation of multimodal data by developing machine learning methods capable of learning semantic representations that are robust across multimodal input data. To keep this work grounded and ensure that success is measurable, we intend to focus on the problem of unmanned aerial system (UAS) localization. However, if successful, we envision benefits of this research beyond UAS localization, such as multimodal data aggregation techniques that enable analysts to more easily access and process relevant data for an operation, whether these tasks are algorithmically-performed (such as object detection, change detection, etc.) or not (e.g., aggregation for human consumption). Our envisioned approach is to develop an algorithmic architecture for learning correspondences (preferably semantic in nature) between cross-modal pairs of data. The pairs of data would correspond to a base map and a data element derived from a different modality than the base map. These correspondences would then be used to generate a probability distribution over locations and poses in the base map (e.g., pixel in a satellite image and orientation) where the input data element was collected. More details can be found in the full proposal. Researchers from The Johns Hopkins University Applied Physics Lab (JHU/APL) and the University of Kentucky (UKY) will team together to perform this work, with Dr. Gordon Christie from JHU/APL serving as the Principal Investigator. The primary result of this work will include the deep learning architecture(s), loss function(s), and model weights developed and trained to achieve state-of-the-art performance on this task, which will have been thoroughly researched and tested against existing state-of-the-art. We anticipate our architecture(s) being general enough to support other modalities, not included in this proposal, of interest to NGA. Our final report will describe in detail the methods developed to extract multimodal semantic representations, as well as data collected and experiments performed to evaluate these methods. Sufficiently novel content developed as part of this research will also be submitted for publication at top-tier computer vision conferences (e.g., CVPR). This will validate the technical quality of the work if accepted, enable feedback from experts in the field, and ensure our work is as impactful as possible so other researchers can build upon our work to further advance NGA-desired capabilities.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Oct 06, 2020
- Source ID
- HM04762010003
Entities
People
- Gordon Christie
Organizations
- Johns Hopkins University
- National Geospatial-Intelligence Agency