Neuro-Symbolic Compositional Generalization for Language and Vision Comprehension and Grounding
Abstract
We develop a neuro-symbolic framework for imparting explicit reasoning and compositional learning to large vision and language models (VLMs). We propose a principled and integrated approach to impart compositional reasoning capabilities, including spatial, temporal, part-whole reasoning to neural representations by incorporating symbolic layers of reasoning in gigantic transformer-based architectures and interactive language grounding. We address the compositional generalization in a principled way inspired by human cross-situational learning of basic concepts and their compositions and study the formal and functional properties of concept composition. Via neurosymbolic modeling, we exploit the current gigantic transformer-based architectures that convey implicit world knowledge and equip them with symbolic and explicit world knowledge to improve their generalization and reasoning. Moreover, we propose an interactive setting between human and agent to address the issue of compositional grounding. We use performance tasks of interactive instruction following agents in realistic environments, open domain visual question answering, and knowledge-based visual question answering, and evaluate our technical contribution accordingly. We equip our existing framework for the integration of knowledge in statistical and deep learning (i.e. DomiKnowS) with the techniques developed in this proposed research. Approved for Public Release
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- May 15, 2023
- Source ID
- N000142312417
Entities
People
- Parisa Kordjamshidi
Organizations
- Michigan State University
- Office of Naval Research
- United States Navy