Uncertainty-Robust, Many-Modal Information Networks with Explainable Reasoning
Abstract
Online, intelligence, and recorded activity information today is highly unstructured and noisily scattered across several modalities and sensors (e.g., speech and text, images, infographics, videos, tabular databases, etc.), making it challenging for users to find answers that require joint reasoning over multiple diverse sources. This issue holds true for online intelligence information, for webpages such as Wikipedia, Google, and retail, as well as for general recorded activities in homes, offices, hospitals, and battlefields. Therefore, it would be very useful to have an interactive question-answering model that can automatically assimilate information from across all these modalities, and accurately (and interpretably) answer users free-form natural language questions on them (potentially via multiple interaction turns). This includes questions whose answer can only be generated after combining information from multiple modalities. These models should also be able to handle missing, uncertain, and contradictory information in the face of noisy real-world intelligence. Moreover, to be similar to how humans reason, and in light of current AI-safety issues, these models should to be interpretable (and not black-box), such that they can explain their reasoning path to the answer decision. Finally, we would prefer highly scalable approaches that can handle reasoning over large-scale networks with rich, long-term information. We propose the novel approach of many-modal graph-based interactive Q&A, with uncertainty-robustness, explainable reasoning, and scalability properties. In year 1, we will develop the many-modal graph network, which involves translating each modality (e.g., images, videos, databases, and tables) to text-based relational subgraphs and then merging these via graph-based coreference resolution and embedding matching models. In year 2, we will build multi-hop neural reasoning models on these many-modal information networks, which can answer natural language questions requiring a combination of information from multiple modalities (based on modality-specific attention modules from the question and the response), as well as perform memory-based multi-step inference, e.g., path and count based reasoning. Moreover, our graph network will naturally incorporate certainty scores (extraction probabilities) in each relation edge s embedding, allowing the reasoning model to compute the flow of uncertainty along paths and subgraphs. In year 3, we will focus on the AI-safety and risk-minimizing aspect of explainability. Our structured, graph-based reasoning approach can naturally explain the answer b visualizing attention- and saliency-based graph paths. We will also explore the idea of our model being able to textually summarize the decision path to the answer. Finally, our embedding-based, graph-recurrent reasoning models will also address scalability to large network by employing a multi-scale hierarchical architecture that first encodes dense subgraphs and then recursively collapses those subgraphs to super-node hidden states. Training and evaluation of our models will be based on many-modal data from both online website datasets (e.g., Wikipedia) and lab-base on-site recorded data, along with human-annotated question-answer pairs. We will use several automatic as well as human metrics to test the success of these models. The results from this proposal will also be validated on real-world intelligence applications of interest to DoD In particular, we will describe high-potential collaborations with DoD researchers and labs based on our explainable-reasoning models, which can enable low-risk agents for remote human-robot interaction on noisy, large-scale information in battlefields, as well as autonomous decision-making systems that can perform complex multi-step reasoning on diverse information sources and also explain their autonomous reasoning to humans along the way.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Feb 14, 2019
- Source ID
- W911NF1810336
Entities
People
- Mohit Bansal
Organizations
- Army Contracting Command
- United States Army
- University of North Carolina at Chapel Hill