An Interactive Continual Multi-modal Learner based on Memory-Modulated Neural Analogical Reasoning

Abstract

We propose Analogical Networks, an analogical framework for grounded language understanding that encodes world knowledge explicitlyin a collection of stored structured spatio-temporal perceptual experiences at different levels of abstraction, in addition to implicitly, as network parameters. Given a language query, AnalogicalNetworks retrieve relevant past queries and associated perceptual memory graphs on-the-fly and use them to modulate perceptual inference to localize objects, parts, attributes, actions & events, to predict possible future & past action completions, to evaluate counterfactuals, ground referentials, answer questions, act in the environment or update their knowledge. During inference, Analogical Networks will match novel visuo-linguistic sensory input to compositions (in space and time) of partial previous experiences, e.g., a centaur will be parsed as a composition of a horse and a man. Theproposed system will learn continually and mostly autonomously, by relatingexamples of a new concept to variations & compositions of previous ones, augmenting its nonparametric memory with the newly acquired compositions to facilitate future inferences, without any weight updates.Analogical Networks is a transformative paradigm in vision-language learning contributing the following set of keycharacteristics, missing in the current state-of-the-art:1. Multi-task learning with a single model: Analogical Networks cast everytask as an analogical correspondence problem to a different set of memories. We train one neural network across several tasks, where each inference is modulated with a different set of relevant memories. In this way, different tasks share knowledge with oneanother.2. Continual learning. We show Analogical Networks can learn continually through memory expansion, without necessarily updating the weights of a parametric representation, reducing interference problems caused by gradient descent in parametric models.3. Fast one-shot learning. A new experience is used to modulate future inferences immediately after its storage in memory. Neural networks typically need multiple gradient updates for knowledge to be incorporated in the network s weights.4. Uncertainty estimation and interpretability. By investigating the retrieved memories and cross scene-memory attentions, the model#s mistakes can be easily diagnosed and its #reasoning# inspected.5. Symbolic knowledge representations learnt end-to-end for signal to symbol grounding. In Analogical Networks, symbol (entity) representations in the structured memory entity graphs are trained end-to-end for localizing them in the sensory stream, addressing the central signal to symbol mapping problem of earlier knowledgerepresentation frameworks, such as Frames,Schemas, or Scripts.We discuss preliminary results of Analogical Networks for the task of 3D object part segmentation where they dramatically outperform parametric alone state-of-the-art segmentors in the setting of few training samples, even without any weight updates. We present our proposed research that extends and extensively innovates over the prototype to address analogical open vocabulary scene & video recognition and grounded language understanding, such as, answering questions, grounding referentials, following instructions, & forming visual concepts and behaviors through explanations.Learning world models is the single major open problem intoday#s AI. How can we learn a mental #simulator# of how the world works, that we use to form reasonable perceptual inferences, actcompetently & interpret language? Analogical networks learn to reason about world knowledge by analogy: by retrieving structured perceptual(potentially language-annotated) experiences, and contextualizing them with the present input to mix & match & modify them in order to fit the present scenario, provide plausible past and future completions, and grounded language interpretations.Approved for Public Release

Document Details

Document Type: DoD Grant Award
Publication Date: May 15, 2023
Source ID: N000142312415

Entities

People

Katerina Fragkiadaki

Organizations

Carnegie Mellon University
Office of Naval Research
United States Navy

An Interactive Continual Multi-modal Learner based on Memory-Modulated Neural Analogical Reasoning

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas