Intentional multi-modal self-learning to perceive and understand the real world

Abstract

Our goal is to design and build systems that can learn general-purpose models that enable deep understanding of the physical worl,d, by producing interpretations of busy, occluded scenes containin,familiar, ranging from basic rigid objects up through kinematic and deformable objects, and other agents. The resulting scene interp,amework based on two complementary technical ideas: (i) condensing, generalizing, and abstracting data acquired from experience into, a multi-modal model that is inspired by insights from human cognition and (ii) self-guided exploration, driven by the need to causa,lly understand the world, to gather useful data as efficiently as possible. First, it is critical for a general AI system to havea, multi-modal model: different aspects of our physical world and different queries require different types of representation. Our sys,tem will be multi-modal in three senses: (1) It will simultaneously learn representations at multiple levels of abstraction, with di,fferent prior biases and predictive strengths and weaknesses; (2) It will learn from multiple sensory modalities, including vision,, sound, language, and touch; (3) It will learn in multiple modes, including from observation, interaction, demonstration, instructio,n, and explanation. By ensuring that each component knows what it knows, we can combine the outputs of each model by weighting based, on their confidence, and can communicate and explain theirunderstanding of the world more effectively to humans. Second, in order, to efficiently and effectively gather data to learn about the complex physical world, a system must intentionally explore its envir,onment. Random or local exploratory behavior can be shown formally to be deficient for visiting and observing all aspects of a compl,nce and understanding. The agent poses information-gathering problems for itself, which it tries to solve using its current understa,nding of the world. While solving these problems, it reaches new parts of its state space and makes new observations, giving it info,rmation about novel aspects of the world, which it uses to improve its perceptual abilities and causal world models. These improveda,bilities allow it to stretch farther, seeking a more nuanced and sophisticated and broad understanding of the world, which increases, in this virtuous cycle. This strategy enables self-driven learning that is cumulative and compositional, recognizing and general,izing over fundamentally different types of domain regularities, and taking advantage of its ability to gather its own data to learn, with high sample efficiency and little human intervention. We are inspired by human cognitive processing, focused on predictions, about the physical world and about the behavior, intentions, and beliefs of other agents; in addition, we will develop newcomputati,onal models that could serve as hypotheses for new studies of human cognition. Our algorithmic methods will be trained and extens,ively tested in a suite of complex and diverse benchmark tasks, ranging from understanding individual scenes through making long-ter,m hypothetical predictions, implemented in sophisticated highly realistic simulation environments. This work is critical to the Na,vy?s research thrust on Sense and Sense-Making. Our methods will help learn to transform data into understanding, enable autonomous, agents to be persistently aware of their operating environment and optimize their operation appropriately, and be a major step towa,rd the integration of artificial intelligence into a wide variety of C4ISR systems. (approved for public release)

Document Details

Document Type
DoD Grant Award
Publication Date
Oct 07, 2022
Source ID
N000142212740

Entities

People

  • Leslie P. Kaelbling

Organizations

  • Massachusetts Institute of Technology
  • Office of Naval Research
  • United States Navy

Tags

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Neural Network Machine Learning.
  • Theoretical Analysis.

Technology Areas

  • AI & ML
  • Space