Intentional multi-modal self-learning to perceive and understand the real world

Abstract

Our goal is to design and build systems that can learn general-purpose models that enable deep understanding of the physical worl,d, by producing interpretations of busy, occluded scenes containin,familiar, ranging from basic rigid objects up through kinematic and deformable objects, and other agents. The resulting scene interp,amework based on two complementary technical ideas: (i) condensing, generalizing, and abstracting data acquired from experience into, a multi-modal model that is inspired by insights from human cognition and (ii) self-guided exploration, driven by the need to causa,lly understand the world, to gather useful data as efficiently as possible. First, it is critical for a general AI system to havea, multi-modal model: different aspects of our physical world and different queries require different types of representation. Our sys,tem will be multi-modal in three senses: (1) It will simultaneously learn representations at multiple levels of abstraction, with di,fferent prior biases and predictive strengths and weaknesses; (2) It will learn from multiple sensory modalities, including vision,, sound, language, and touch; (3) It will learn in multiple modes, including from observation, interaction, demonstration, instructio,n, and explanation. By ensuring that each component knows what it knows, we can combine the outputs of each model by weighting based, on their confidence, and can communicate and explain theirunderstanding of the world more effectively to humans. Second, in order, to efficiently and effectively gather data to learn about the complex physical world, a system must intentionally explore its envir,onment. Random or local exploratory behavior can be shown formally to be deficient for visiting and observing all aspects of a compl,nce and understanding. The agent poses information-gathering problems for itself, which it tries to solve using its current understa,nding of the world. While solving these problems, it reaches new parts of its state space and makes new observations, giving it info,rmation about novel aspects of the world, which it uses to improve its perceptual abilities and causal world models. These improveda,bilities allow it to stretch farther, seeking a more nuanced and sophisticated and broad understanding of the world, which increases, in this virtuous cycle. This strategy enables self-driven learning that is cumulative and compositional, recognizing and general,izing over fundamentally different types of domain regularities, and taking advantage of its ability to gather its own data to learn, with high sample efficiency and little human intervention. We are inspired by human cognitive processing, focused on predictions, about the physical world and about the behavior, intentions, and beliefs of other agents; in addition, we will develop newcomputati,onal models that could serve as hypotheses for new studies of human cognition. Our algorithmic methods will be trained and extens,ively tested in a suite of complex and diverse benchmark tasks, ranging from understanding individual scenes through making long-ter,m hypothetical predictions, implemented in sophisticated highly realistic simulation environments. This work is critical to the Na,vy?s research thrust on Sense and Sense-Making. Our methods will help learn to transform data into understanding, enable autonomous, agents to be persistently aware of their operating environment and optimize their operation appropriately, and be a major step towa,rd the integration of artificial intelligence into a wide variety of C4ISR systems. (approved for public release)

Document Details

Document Type: DoD Grant Award
Publication Date: Oct 07, 2022
Source ID: N000142212740

Entities

People

Leslie P. Kaelbling

Organizations

Massachusetts Institute of Technology
Office of Naval Research
United States Navy

Intentional multi-modal self-learning to perceive and understand the real world

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas