Deep Multimodal Fusion & Characterization of Physics-Base and Human-Based Sources
Abstract
In this project we are proposing to use multimodal deep learning architectures to capture multi-sensory information to improve classification and prediction accuracies compared to single-modality models. We propose a multi-modal deep fusion-classification framework which consists of three distinct modules. It cooperatively extracts domain-specific features (module 1), exploits the relationship between these sensor features to obtain a shared latent feature representation (module 2) and then trains a classifier (module 3) optimized jointly with all the previous two modules. Each module performs specific task and their parameters (weights) are all optimized jointly together to obtain the best performance for a given task. The major research activity is to design appropriate domain-specific deep architectures dedicated for each sensor data but optimized all together. For each modality, we propose a domain-specific deep neural network architecture dedicated to exploit the characteristics of that sensory data in order to extract a compact domain-specific feature vector. All the domain-specific deep networks collaboratively optimized together through a shared latent common feature subspace layer which represents a domain where the information from all the sensors are jointly represented in a compact form. This joint information is input to train a classifier whose classification error is fed back to the shared layer and then to the domain-specific deep networks to ultimately extract discriminative domain features. This project also address the problem of cross-modality for heterogeneous sensor data for classification and prediction. We show how to train a coupled deep networks such that the output of the two networks are forced to be highly correlated or approximately to be the same. Two deep networks are coupled together to map the individual sensor data into a common latent subspace. This common latent subspace provides a unified approach to measure the distance of objects from different modalities. As long as all the sensorsÕ data can be mapped into the same common latent subspace, they become comparable. Thus, the coupled deep network is trained to predict if the test data from the two different modalities are related or not.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Feb 25, 2019
- Source ID
- W911NF1710018
Entities
People
- Nasser M. Nasrabadi
Organizations
- Army Contracting Command
- United States Army
- West Virginia University