Neuro-Symbolic Compositional Generalization for Language and Vision Comprehension and Grounding

Abstract

We develop a neuro-symbolic framework for imparting explicit reasoning and compositional learning to large vision and language models (VLMs). We propose a principled and integrated approach to impart compositional reasoning capabilities, including spatial, temporal, part-whole reasoning to neural representations by incorporating symbolic layers of reasoning in gigantic transformer-based architectures and interactive language grounding. We address the compositional generalization in a principled way inspired by human cross-situational learning of basic concepts and their compositions and study the formal and functional properties of concept composition. Via neurosymbolic modeling, we exploit the current gigantic transformer-based architectures that convey implicit world knowledge and equip them with symbolic and explicit world knowledge to improve their generalization and reasoning. Moreover, we propose an interactive setting between human and agent to address the issue of compositional grounding. We use performance tasks of interactive instruction following agents in realistic environments, open domain visual question answering, and knowledge-based visual question answering, and evaluate our technical contribution accordingly. We equip our existing framework for the integration of knowledge in statistical and deep learning (i.e. DomiKnowS) with the techniques developed in this proposed research. Approved for Public Release

Document Details

Document Type
DoD Grant Award
Publication Date
May 15, 2023
Source ID
N000142312417

Entities

People

  • Parisa Kordjamshidi

Organizations

  • Michigan State University
  • Office of Naval Research
  • United States Navy

Tags

Fields of Study

  • Computer science

Readers

  • Artificial Intelligence
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML